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PROTEIN DESIGN FOR RECEPTOR-LIGAND RECOGNITION AND BINDING 

CROSS-REFERENCE TO RELATED APPLICATION 
This application claims the benefit of provisional U.S. Appln. No. 60/468,270, 
filed May 7, 2003; which is incorporated by reference herein. 

FEDERAL GOVERNMENT SUPPORT 
This invention was made with federal government support under grant 
GM049S71 awarded by the National Institutes of Health, grant N00 14-01 -1-023 8 
awarded by the Office of Naval Research, and grant F49620-02-0063 awarded by the 
Defense Advanced Research Project Agency. The U.S. Government has certain rights 
in the invention. 

FIELD OF THE INVENTION 
Formation of a complex between a receptor and its ligand is fundamental to 
biological processes at the molecular level. Manipulation of molecular recognition 
between a ligand and its receptor is therefore important for study of biological pheno- 
mena (1) and has numerous applications, including, but not limited to, construction of 
improved or novel enzymes (2-5), biosensors (6, 7), genetic circuits (8), signal trans- 
duction pathways (9), and chiral separations (10). Preliminary results were published 
by us in Looger et al (1 1). 

BACKGROUND OF THE INVENTION 
The most commonly used methods for altering specificities are empirical, using 
either the immune system to generate antibodies (13), directed evolution or gene 
shuffling (14), or screening of large libraries for altered functionality (15). These 
approaches lose in generality either because they are limited to a particular class of 
proteins (antibodies), or because of constraints in the sequence diversity and methodo- 
logies available (selection by directed evolution or gene shuffling, library screening). In 
practice, it is typically possible to screen protein libraries fully degenerate at no more 
than 10, or certainly 15, positions (16). Structure-based, rational design techniques 
potentially offer enormous generality for manipulating protein structure and function 
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(17). Generality arises from (i) the ability to describe any chemical structure (scaffold 
or target ligand), and (ii) the use of computational algorithms that can address combina- 
torial search spaces that vastly exceed those addressable empirically (16). 

The general principles for the formation of specific complexes are understood in 

5 considerable depth (IS), and involve a lock-and-key fit between ligand and receptor, 
the structure of which is determined primarily by short-range interactions (e.g., steric 
contacts, hydrogen bonds). Complex formation is thermodynamically driven primarily 
by hydrophobic effects (19), long-range electrostatics (20, 21), and possibly by 
differences between protein interiors and solvent in the strength of the short-range 

10 interactions (22). Difficulties in structure-based computational design arise from 

limitations in the description of the molecular interactions (23) and the combinatorial 
complexity of the problem (16, 24). Despite notable advances in the rational manipula- 
tion of protein sequence and stability using automated computational design tools (16), 
prior to this invention the computational design of ligand-binding properties has been 

15 limited to metal centers (5), changes in binding specificity in which much of the 
chemical character of the wild-type ligand is retained (2, 9, 25) or larger changes in 
binding specificity which resulted in relatively weak binding (3). 

In comparison to the selection of enzymes by catalytic antibodies, this invention 
has several other advantages. The ligand, which is a transition-state analog for the 
- 20 target chemical reaction, must possess sufficient stability and antigenicity to induce an 
antibody response. But it must be nontoxic to the immunized animal and not cause 
undesirable biological effects. In contrast, the design algorithms of this invention does 
not require chemical synthesis of the ligand or its administration to an animal because 
the ligand can be manipulated in silico. The efficiency of this invention is shown by the 

25 proportion of designs that successfully bind ligand and/or catalyze a reaction, whereas a 
large number of hybridomas are typically screened to select an antibody with modest 
catalytic activity. Furthermore, the proteins designed by this invention can be synthe- 
sized with one or more non-natural residues which form peptide bonds, side chains 
thereof, post-translational modifications, and combinations thereof instead of relying on 

30 antibody-producing cells which are capable of only natural protein synthesis. 
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SUMMARY OF THE INVENTION 
It is an objective of the invention to provide processes for the protein structure- 
based design or redesign of receptor-ligand interfaces (ligand-binding sites) in which a 
ligand is recognized and bound. Receptors designed in this manner can then be manu- 

5 factured or used to engineer cells, tissues, or organisms. They can be further evaluated 
by empirical methods (e.g., ligand recognition and binding, gene expression, signaling 
pathways, catalysis), subjected to further improvement, and/or the process can be 
iterated in multiple cycles (further comprising a consideration of quantitative structure- 
activity relationship data). 

10 The invention thus relates to a process for protein design in accordance with 

spatial and energy relationships between a proteinaceous receptor and a ligand. The 
process can comprise (a) generating a collection of ligand poses to provide a Docking 
Zone that represents potential conformations and degrees of freedom of the ligand rela- 
tive to the receptor, (b) generating a collection of amino acid side-chain conformations 

15 on the backbone of the receptor to provide an Evolving Zone; (c) calculating a cost 
function (e.g., atomic interaction(s) between ligand poses of the Docking Zone and 
amino acid side chains of the Evolving Zone, and between amino acid side chains of 
the Evolving Zone); (d) generating a collection of candidate receptor designs with 
ligand binding sites by selecting from combinations of the ligand poses and the amino 

20 acid side chains one or more of the combinations that corresponds to optimal or near- 
optimal values of the cost function; and optionally (e) rank-ordering the candidate 
receptor designs of the collection resulting from (d) by a fitness metric to identify one 
or more candidate receptor designs that potentially binds to the ligand. Binding to the 
ligand of the one or more candidate receptor designs can then be confirmed; alterna- 

25 tively, the ligand may be an analog which is bound or a reactive substrate or product of 
an enzyme. 

Some improvements of the invention over the prior art are using the Docking 
Zone and the Evolving Zone in calculating atomic interactions between receptor and 
ligand (i.e., potential function), from a subset of all possible combinations, evaluating 
30 the hydrogen bond inventory of the ligand and/or binding surface inventory of the 

receptor-ligand interaction, and algorithms to rank-order and select pairs of ligands and 
mutated receptors. Further mutations in the receptor may be introduced outside its 
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ligand binding site to stabilize the protein, to increase affinity for ligand, to improve 
catalysis, or a combination thereof because the further mutations act on residues in the 
Evolving Zone. 

The process can be implemented as a computer system or stored on tangible 
5 medium. Protein designed by the invention and made by chemical synthesis or trans- 
lation; nucleic acid encoding that protein; an expression vector comprised of that 
nucleic acid; and an engineered cell, tissue, or non-human organism are other embo- 
diments of the invention. 

Further aspects of the invention will be apparent to a person skilled in the art 
10 from the following description and claims, and generalizations thereto. 

DESCRIPTION OF THE DRAWINGS 
Figure 1 shows an embodiment of the invention. The flowchart highlights major 
stages in the Receptor Design algorithm: (i) preparation of target ligand, including 

15 force field and structural descriptions; (ii) preparation of design scaffold, including 
identify-cation of target binding site, docking grid, and docking hull; (iii) construction 
of CLEPs (Compatible Ligand Poses), to represent the ensemble of all possible compa- 
tible poses of the target ligand within the target binding site; (iv) generation of a family 
of complementary surfaces against the CLIPs; and (v) refinement of this family of com- 

20 plementary surfaces by well search of related sequences, ranking by receptor-ligand 
interface estimators, and design cycle feedback from experimental characterization of 
designed receptors. 

Figure 2 shows the conformational equilibrium of the periplasmic binding 
protein (PBP) superfamily, and target ligands and structurally-related compounds. (A) 

25 Ribose-binding protein is shown as representative of the protein superfamily. Ribose 
binding mediates a transition from an open (left) to a closed (right) conformation (62, 
86). The protein has two domains (I, amino terminal; II, carboxy terminal) linked by a 
hinge region (H). Fluorescence intensity changes of an environmentally sensitive, thiol - 
reactive fluorescent dye (shown as a solid sphere near the hinge region) coupled to a 

30 mutant cysteine at position 265 monitor ligand binding (7). Calculations use the closed 
structure, mutating the PCS residues, and docking the target ligands into the convex 
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hull (shown only as edges). (B) Structures of target ligands and structurally related 
decoys used to probe the specificity of the designed receptors. 

Figure 3 shows stereo views of representative designed ligand-binding sites: (A) 
TNT.R3; (B) Lac.Rl; (C) Lac.Hl; (D) Stn.Al (dashed lines: hydrogen bonds between 
protein and ligand; numbers: side chains close to the ligand). TNT.R3 and Lac.Rl are 
presented in the same orientation, illustrating the adaptability of the RBP scaffold to 
bind different ligands. The Lac.Rl and Lac.Hl structures illustrate that the same ligand 
can be bound by sites designed in different scaffolds. 

Figure 4 shows fluorescence data for a representative designed receptor Lac.Rl. 
(A) Fluorescence emission spectra for apo (closed circle) and L-lactate-saturated (open 
circle) protein solution. (B) Fluorescence emission intensity at 470 nm is shown as a 
function of L-lactate concentration. The fluorescence titration profile is fit to a single- 
site binding isotherm (7). 

Figure 5 shows thermostability data for a representative subset of designed 
receptors. Experiments were conducted in 20 mM sodium phosphate and 150 mM 
sodium chloride, pH 7.0; protein concentration was 10 ^iM. Ellipticity was monitored at 
222 nm. Measured T m s for mutants: TNT.A1 (circle), 52°C; TNT.R1, 42°C; TNT.R2 
(square), 54°C; TNT.H1, 46°C; Lac.A3 (diamond), 46°C; Lac.G2, 50°C; Lac.Hl 
(triangle), 45°C. These results show that the mid-point transitions fall within 2-15°C of 
the wild-type proteins (wild-type T m s are: RBP, 58°C; GBP, 59°C; HBP, 58°C; ABP, 
54°C; QBP, 62°C), and that the degree of cooperativity of the designed receptors are 
similar to the wild-type receptors. 

Figure 6 shows ligand-binding specificity data for the designed receptors: (A) 
TNT, (B) L-lactate, (C) serotonin, and (D) D-lactate. Almost all of the designed 
receptors show a stronger affinity for their target ligands relative to structurally-related 
decoys, consistent with correct modeling of receptor-ligand complex. Results are 
reported as the free energy difference, AAGb, relative to the target ligand (AAGb = RT 
In (Kd (decoy)/Kd (target)); AAGb > 0 indicates preference for target ligand). RT ~ 0.6 
kcal/mol. A ten-fold difference in affinity corresponds to approximately 1.4 kcal/mol of 
binding specificity. Target ligands and protein scaffolds are denoted using single-letter 
abbreviations. Ligands: TNT, T; L-lactate, L; serotonin, S; D-lactate, D. Scaffolds: 
RBP, R; ABP, A; HBP, H; GBP, G; QBP, Q. 
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Figure 7 shows quantitative structure-activity relationships (QS ARs) for the 
ligand-binding affinities of the designed receptors. Calculated affinities are obtained 
from the model structure of the complex by: log(Kd) = ci + c 2 AG e iec + c 3 A + C4N unsat + 
CsNciash + c 6 | s - sO |. The linear regression coefficients, Ci . . .c 6 , were obtained by least- 
squares fit of the experimental data; AG elec is an electrostatic contribution (87); A is the 
nonpolar contact area between receptor and ligand; N unsat is the number of unsatisfied 
hydrogen bonds in the ligand; N clash is the number of steric clashes between the ligand 
and receptor (defined as contacts greater than 5 kcal/mol); s is the ratio of the van der 
Waals volume of the wild-type ligand to that of the target ligand; sO is the apparent 
optimum value of s for a particular ligand, obtained by the least-squares fit. Analogs are 
modeled to bind in the same mode as the target ligand, constructed by superimposition 
of the phenyl ring for nitro compounds and the carboxylate moiety for lactate analogs. 
(A) Independent QS ARs for TNT (filled circle, solid line) and L-lactate (open circle, 
dashed line). For TNT, the least-squares fit parameter vector {s0, c h c 2 , c 3 , c 4 , c 5 , c 6 } is 
{0.84, -6.2, 0.1, -0.05, 0.5, 2.2, 41.3}; and for L-lactate {1.76, -6.5, 0.09, -0.04, 0.4, 0, 
12.7} (for L-lactate c 5 is undetermined, since there are no steric clashes). (B) Combined 
QSAR obtained by fitting all ligands simultaneously: TNT (filled circle), TNB (filled 
square), 2,4-DNT (filled diamond), 2,6-DNT (filled triangle), L-lactate (open circle), 
D-lactate (open square), pyruvate (open diamond). All nitro compounds and lactate 
analogs were fit together, with only the parameters s 0 and c 6 being ligand-dependent. 
The resulting fit is {(0.85, 1.73), -5.2, 0.04, -0.03, 0.02, 0.9, (54, 12)} (s 0 and c 6 are 
ligand-dependent: the first number refers to the nitro compounds, and the second to the 
lactate analogs). 

Figure 8 shows a synthetic two-component signal transduction pathway (84). 
(A) The ligand-bound RBP or GBP (i) interacts with the Trg domain (thick black line) 
of a chimeric transmembrane histidine kinase, Trz (ii), resulting in autophosphorylation 
of the EnvZ domain (grey line), and phosphate transfer to OmpR (iii), which then binds 
to the ompC promoter (iv),upregulating lacZ transcription. (B) Response to TNT 
(circle: TNT.R1; square: TNT.R2; diamond: TNT.R3). (C) Response to sugar (open 
circle: ribose and wild-type RBP; open square: glucose and wild-type GBP) and L- 
lactate (filled circle: Lac.Rl; filled square: Lac.Gl). p-galactosidase activities are 
reported as the difference in assay end-point absorbances of ligand-stimulated and 



WO 2005/007806 PCT/US2004/0 14395 

7 

unstimulated cultures. Sensitivity of E. coli to high TNT or L-lactate concentrations 
precluded determination of full dose-response curves. There is no response in the 
absence of receptors or trz. 

Figure 9 shows the chemical structures of soman and related molecules. 
5 Figure 10 shows another embodiment of the invention. Numbers in the flow- 

chart (A) and molecular drawings (B-E) correspond to processes described herein: 
panels 1-2, rotational ligand ensemble; panels 3-4, truncated scaffold with alanine 
surface and convex hull; panels 5-6, placed ligand ensemble; panels 7-8, example of a 
complementary surface design. 
10 Figure 11 shows structures of GBP and RBP (domains I and II) with computa- 

tional models of representative designs (protons are not shown): (A) GBP design PG10, 
(B) GBP design PG12, and (C) RBP design PR8. Residues selected for alanine- 
scanning mutagenesis are italicized. 

Figure 12 shows selection of GBP designs (■, ligand-mediated fluorescent 
15 response with experimentally observed affinities as indicated; •, not tested; o, no 

fluorescent response; x, no protein expression; 0, protein precipitation). Designs were 
chosen from a final list of candidates using a linear optimization procedure that selected 
a subset corresponding to the intersection of the top 20% ligand van der Waals energy, 
50% ligand H-bond energy, with all H-bonds satisfied and with solvent-accessible 
20 surface areas less than 15 A 2 . The designs are shown ranked by the van der Waals 
energy (£ V dw) of the interaction between ligand and receptor, which is a measure of 
close packing. Inset: correlation between the experimentally determined PMPA 
affinities and E vAw for the tested designs. 

Figure 13 shows the fluorescent response of fluorescein-labeled PG12 upon 
25 titration with PMPA. Inset: emission spectra of protein in the absence (solid line) or 
presence of 0.5 mM PMPA (dashed line). 

Figure 14 shows the correlation between experimentally determined fragment 
coupling energy, AG C , and the affinity for PMPA, AGb.PMPA- 

Figure 15 shows biochemical pathways related to triose phosphate isomerase. 
30 (A) Role of TIM in glycolysis, gluconeogenesis, and methylgloxate metabolism (104, 
1 12) (G6P, glucose-6-phosphate; F1,6P 2 , fructose-l,6-bisphospate; PFK, 
phosphofructokinase; MGS, methylglyoxate synthetase). (B) TIM mechanism. (C) 



' • ! 

WO 2005/007806 PCT/US2O04/O14395 

8 

Comparison of yeast TIM (1 10) (flexible loop; catalytic residues; phosphoglycolate) 
and RBP (62) (I and n, amino terminal and carboxy terminal domains respectively; H, 
hinge region; ribose) structures. 

Figure 16 shows the predicted structures of RBP-based designs. (A) DHAP- 

5 binding receptor Dl (stereo view) with ligand and designed complementary surface 
residues. (B) NovoTiml.O (stereo view) with enediolate, catalytic residues, and 
complementary surface. (C) NovoTiml.2 with a layer of residues surrounding the 
active site, mutation of which confers near wild-type stability (enediolate; catalytic 
residues; substrate-binding residues). Also indicated are the mutations isolated by 

10 directed evolution of NovoTiml.2 (view hides 264h) that increase enzyme activity 
(NovoTiml .2. 1 : Lys76iAsn, Lys243iSer; NovoTiml .2.2: Lys76iAla, Glu255iVal; 
NovoTiml.2.3: Asp264 H Gln; NovoTiml. 2.4: Val55iSer). 

Figure 17 shows yet another embodiment of the invention. (A) Integration of 
the algorithms for placing side chains and ligands with predefined geometries (85) to 

15 generate partial sites that specify the location and structures of the catalytically active 
residues, with the design of stereochemically complementary substrate-binding surfaces 
to design complete active sites. (B) Geometrical definition used to generate placement 
of the active site residues. Positioning of the catalytic residues (glutamate, histidine, 
lysine) is shown relative to the plane of the enediolate. The enediolate conformation is 

20 designed to minimize phosphate elimination, and is derived from the structure of a 

phosphoglycolate complex (110). To define the constraints for histidine, a pseudoatom, 
\|/, was placed midway (circle) between d and C 2 . Geometrical constraints are formu- 
lated (85) in terms of allowed intervals for bond lengths (Z), angles (o), and torsions (0) 
for each residue relative to the enediolate: glutamate, l(C\,C^: 2-5 A) , C0i(C§, Ci, C 2 : 

25 107°±30°), co 2 (C 1 ,C 2 ,O e1 : 62.3°±30°), 0i(Oi,C 2 , C h C B : 180°±15°), G 2 (C 2 , CiA,O el : 
unconstrained), 0 3 ( Ci,C 5 ,0 £ i,Oe 2 : 0°±30°); histidine: /(N e2 ,\j/ : 2-4A), Q)i(Cy,N £2 ,\|/: 
127.5°), c^OWA: 90°), 0i(Cy,C52, N £2 , \|/: 180°), 0 2 (C 52 , N e2 , V A: 0°±30°), 
03(N £2 ,\|/,C 1 ,O 1 : 0°±45°); lysine: Z(0 2 ,N c : 2-5A), 0)i(C 2 ,O 2 ,N c : 90°-180°), 0)2(O 2 ,N c ,C £ : 
90°-180°), Oi^AO^N^: 180°±90°), 0 2 (C 2 , 0 2 ,N c ,C £ : unconstrained), 

30 03(O 2 ,N;,C e ,C5: unconstrained). 

Figure 18 shows the properties of selected designs. (A) Thermostability 
(reported as mid-point transition, T m , values) monitored by temperature dependence of 
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ellipticity (119) (wild-type RBP, open diamond, T m = 58°C; NovoTiml. 0, squares, T m 
= 37°C; NovoTiml. 1, diamonds, T m = 43°C; NovoTiml.2, circles, T m = 52°C). Steady- 
state kinetics (Line weaver-Burke transformation (120)) of NovoTiml.2 for (B) forward 
(DHAP to GAP) and (C) reverse (GAP to DHAP) reactions. (D) Alanine mutants of 
5 catalytic residues (E15, H90, K132) in NovoTiml .2, presented as energy difference 
diagrams (18) (effects on rate enhancements (A- cat changes), stippled; effects on 
Michaelis complex (Km changes), hashed). (E) pH dependence of £ ca t for the forward 
( £ ca t, triangles) and reverse ( k cat squares) reactions of NovoTiml. 2 (calculated 
app p£ a s: forward (6.5, 9.5); reverse (5.9, 9.3)). 

10 

DESCRIPTION OF SPECIFIC EMBODIMENTS OF THE INVENTION 
The terms "receptor" and "protein" are used interchangeably herein because the 
amino acid residues of the receptor are designed by the invention. It is understood, 
however, that the protein can include non-proteinaceous domains, some of which can 

15 contribute to function. The "ligand" is not so limited in its chemical structure because it 
can be wholly or partially comprised of amino acid, carbohydrate, fatty acid, and small 
organic or inorganic moieties. Similarly, the terms "binding" and "recognition" are 
used equivalently. The receptor-ligand nomenclature is somewhat arbitrary because the 
terms could be interchanged if the interacting domains of both molecules are protein- 

20 aceous and binding/recognition is mutual. 

The methodology utilizes three-dimensional representations of protein structure 
(e.g., Cartesian or spherical coordinate sets) to predict the necessary mutations that are 
required to change the amino acids in the surface of an existing binding site to bind a 
new ligand in place of the original ligand with a binding constant (i.e., the concentra- 

25 tion of ligand resulting in 50% occupancy of the designed site: "affinity") and specifi- 
city (i.e., binding of the desired "target" ligand with more favourable affinities than 
other "decoy" ligands that may or may not resemble the chemical structure of the target 
ligand) appropriate for the desired function(s) of the engineered protein(s). In addition 
to the redesign of known ligand-binding sites, the method can design such receptor- 

30 ligand interfaces in regions that are not necessarily known to bind ligands (de novo 
design of ligarid-binding sites). 

A process of the present invention can have the following components: 
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1. A three-dimensional description (e.g., Cartesian coordinate set) of the protein 
structure in which the ligand-binding site is (re-)designed. 

2. A definition of the region where the new ligand is to bind (the "target binding 
site"). 

5 3. A three-dimensional description of the target ligand, as well as any ligand degrees 
of freedom. 

4. A description of the atomic interactions (e.g., potential function) which describes 
the behavior of interactions between a protein and its target ligand at the molecular 
level. In general, the "cost function" may include a potential function based on one 

10 or more descriptors. The cost function may also include other considerations: e.g., 
selection of particular amino acid residues or their statistical distribution, chemical 
properties built into the ligand-binding site or catalytically-active site, and quantum 
mechanical calculation. 

5. A three-dimensional description of allowed amino acid structures used to generate 
15 mutations (amino acid "rotamer library"). 

6. An algorithm that utilizes components 1-5 to predict sets of mutations in the 
binding site that bind the target ligand. 

These components are described in detail below. In some embodiments, the 
invention claims novelty in: 
20 • methods for combining docking of a ligand into a protein scaffold with calculation 
of a stereochemically complementary surface, 

• the description of the target binding site (component 2), 

• the description of atomic interactions (component 4), and 

• methods for predicting mutations (component 6). 

25 We have reduced the invention to practice by embodying the method in worldng 

computer programs (the ReceptorDesigner suite, which has been incorporated into a 
larger suite of computational protein design programs, known as the DEZYMER suite). 
Additionally, we have validated the method by experimentation and created receptors 
which bind trinitrotoluene (TNT), L-lactate, D-lactate, serotonin, pinacolyl methyl 

30 phosphonic acid (PMPA), or dihydroxyacetone phosphate (DHAP)/glyceraldehydes 2- 
phosphate (GAP) with high selectivity and affinity, using a number of different proteins 
as starting points. We demonstrate that these computationally predicted, engineered 
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receptors can function as biosensors (6, 7, 12) for their new ligands, and can be 
incorporated into synthetic bacterial signal transduction pathways, thereby regulating 
gene expression in response to extracellular TNT or lactate. The use of diverse ligands 
and proteins proves experimentally that a high degree of control over biomolecular 

5 recognition has been established computationally. The biological and biosensing 
activities of the designed receptors illustrate some of the potential applications of 
computational design. 

The process of protein design is general, and can be provided any protein 
structure (or model thereof) and target ligand (small molecule, protein, nucleic acid, 

10 carbohydrate, lipid, metal, or other) as input. Consequently it can be used to manipulate 
or introduce ligand-binding sites in any protein, for any ligand. The engineered proteins 
can be used either as materials ex vivo, taking advantage of the specific, high-affinity 
molecular recognition properties of biomolecular interactions, or can be re-introduced 
into living systems to function as biologically active components. The scope of poten- 

15 tial applications of this method is f therefore very large (described below), encompassing 
any field that takes advantage of receptor-ligand interactions. 

The process is conveniently implemented as instructions for a computer system 
which can be comprised of a processor for calculating values from input data and 
otherwise manipulating data; a bus to control the flow of data between the processor 

20 and other devices, one or more input/output devices (e.g., keyboard, display, pointer, 
reader or writer of storage medium), and a storage medium. The instructions, data, and 
calculated values can be read from or written on media such as, for example, a mecha- 
nical switch or electronic valve, iron core, semiconductor RAM or ROM, magnetic or 
optical disk, or paper or magnetic tape. The medium can be erased, refreshed (e.g., 

25 dynamic), or permanent (e;g., static); it can be fixed or transportable. 

The Receptor Design method constructs an ensemble of target ligand poses in 
the target ligand-binding site of the scaffold protein structure (the "Docking Zone"), 
and constructs an ensemble of side-chain conformations representing a set of possible 
mutations at each amino acid position in the target complementary surface (the 

30 "Evolving Zone"). Subsequently, degrees of freedom in the Docking and Evolving 
Zones are combined to identify multiple combinations of a single docked ligand pose 
with an associated complementary surface (mutant amino acid structure). These 
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receptor designs are then rank-ordered using a fitness metric and a subset is submitted 
for experimentation (fabrication and characterization of engineered, mutant proteins). A 
subsequent stage can involve an iteration in which the experimental characterization of 
the initial set of designs is used to construct a refined fitness metric which is then used 
to re-rank the designs or to produce a new set of designs that are then submitted for 
experimentation. 

I. Components of the Calculation 
Choice of Scaffold 

The scaffold is a three-dimensional representation of a protein structure (a 
preferred embodiment is a Cartesian coordinate set specifying the position of all or a 
subset of atoms in the protein). This structure can be obtained using any of several 
methods such as, for example: 

• isolation from a library of experimentally-determined structures, such as the Protein 
Data Bank (26), 

• modification of such a structure by programs designed to check the plausibility of 
protein structures and to identify and rectify potential mistakes caused by 
experimental data or model fitting (27, 28), 

• modification by minimization of such a structure against a molecular mechanical 
potential, typically by conjugate gradient descent methods (29), 

• modification by the replacement of particular amino acid side chains by side chains 
of other amino acids, either naturally-occurring or non-naturally-occurring 
(including non-naturally-occurring side chains resulting from the coupling of a 
thiol-reactive group to a reactive cysteine side chain (7, 30)), 

• modeling by any method designed to predict protein structure from sequence, 
particularly homology modeling methods (31), and "ab initio folding" 
methodologies (32), and 

• construction of a "structural ensemble" containing multiple sets of coordinates, thus 
modeling multiple protein conformations (backbone and side chain) or any of the 
above modifications. 
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Identification of the Target Ligand-Binding Site 

The target ligand-binding site is any region in the scaffold that is desired to bind 
the target ligand. Such a region is defined by the coordinates of the C a carbon atoms in 
the structural model of the scaffold, or more preferably by the atoms that describe the 

5 protein "backbone" structure (any or all of amide nitrogen, amide proton, C a carbon 
atom, C a proton, carbonyl carbon, carbonyl oxygen). 

For example, identification of a target ligand-binding site can be based on the 
experimentally determined structure of a complex between the scaffold and one or 
more of its natural ligands. In this case, the atoms of the scaffold side chains that are in 

10 close contact with the ligand (the interacting atom set) are identified by measuring the 
linear distances between these atoms and the ligand, and selecting those amino acid 
atoms that are involved in hydrogen bonds, or that are in or near to van der Waals 
contact with the ligand. Those amino acids that have interacting atoms form the 
"primary complementary surface" (PCS); residues in the PCS can be truncated to 

15 alanine for target ligand docking and complementary surface generation. The PCS 
positions then define the target ligand binding site. 

Alternatively, an entirely novel ligand-binding site can be specified ab initio by 
selecting a set of protein positions which can, upon mutation, plausibly provide a 
complementary surface for the target ligand. 

20 

Identification of the Evolving Zone and Protein Scaffold Truncation 

The "Evolving Zone" (EZ) constitutes the set of residues that are allowed to 
mutate ("evolve") during the course of the calculation. In the first instance, the EZ 
comprises the residues in the PCS (see above). An additional set of residues can be 

25 included in the EZ, comprised of the layer of amino acids that make direct contact (van 
der Waals interactions, hydrogen bonds) with members of the PCS. These residues 
interact indirectly with the ligand, forming the "secondary complementary surface" 
(SCS); residues in the SCS can be truncated to alanine for target ligand docking and 
complementary surface generation. The SCS plays an important role in stabilizing the 

30 PCS (33, 34), contributing to ligand-binding affinity and specificity, as well as protein 
stability. Additionally, a "tertiary complementary surface" (TCS) can be included in the 
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EZ, comprised of residues that either form or potentially can form hydrogen bonds with 
residues in the SCS. 

Identification of the residues in the PCS, SCS, and TCS is typically performed 
using an automated algorithm which analyzes residue-ligand and residue-residue 
5 distances. These automatically identified sets can also be modified by the user, 
generally to reflect properties of the target ligand (e.g., size, shape). 

Ligand Coordinates 

Three-dimensional atomic coordinates for the covalent structure or structures of 
10 the target ligand can be prepared using any of several methods such as, for example: 

• Isolation from a library of experimentally-determined structures, such as the Protein 
Data Bank (26) or the Cambridge Structural Database (35). 

• Modification of such a structure by addition or removal of atoms subject to 
commonly-accepted rules of generating molecular structure and geometry (36). 

15 • De novo modeling of the structure. This can be carried out using a software 
package, such as the Chem3D program of the CambridgeSoft company 
(http ://w w w . c ambri dgesoft.com) . 

Initial models of molecular structure can be further refined by procedures of 
geometric optimization or minimization of a potential function approximating the 

20 relative free energies of various configurations and conformations of the ligand. Such a 
potential function can be either molecular mechanical in nature (such as the CHARMM 
semi-empirical potential function, or the semi-empirical potential function used in the 
further stages of the Receptor Design procedure), or can be quantum mechanical (such 
as the MM2 (37), Gaussian (http://www.gaussian.com), or MOP AC (38) molecular 

25 potentials). A covalent configuration of the target ligand is determined by specifying 
absolute stereochemistries for all chiral centers, and by specifying values for all bond 
lengths, bond angles, and non-rotatable bond dihedral angles in the molecule. Rotatable 
bonds are initially placed in low-energy dihedral conformations. A full explicit- 
hydrogen model is assumed for all molecular structures. 
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Description of Molecular Interactions Between the Ligand and the Protein Scaffold 

The molecular interactions between the protein and its cognate ligand may be 
described by a potential function, the terms of which capture one or more of van der 
Waals interactions, hydrogen bonding, electrostatics, solvation, and internal entropies 
5 of the amino acid side chains and ligand (or all of them). Such a potential function 
consists of two parts: the mathematical functional forms that describe each component, 
and the parameters for each atom in the amino acids and ligands, that describe the 
magnitudes of the interactions (e.g., partial atomic charges, atomic radii, free energies 
of portioning between water and a non-polar reference solvent). 
10 Ligand parameters modeling the non-bonded interactions of ligand atoms can be 

derived from any number of sources including, but not limited to: 

• Experimentally determined values of atomic radii (39), partial atomic charges (40), 
and hydrogen bond geometries (41). 

• Prediction of these parameters using any number of procedures including empirical 
15 predictions (e.g., the Universal Force Field (UFF) procedure (42)), or quantum 

mechanical predictions (e.g., the MM2 package of the Chem3D program). 
Similarly, the parameters for the amino acids can be taken from a variety of 
sources. A preferred embodiment derives the parameters from the CHARMM23 imple- 
mentation of the CHARMM molecular mechanical potential function (43). 

20 A particularly important component of a potential function, novel to a preferred 

embodiment of this invention, is the "hydrogen bond inventory" term. For a 
representative ligand L-lactate, (i) the hydroxyl group has a hydrogen bond donor and a 
hydrogen bond acceptor and (ii) the carboxylate group has two hydrogen bond 
acceptors. It has been established that in natural receptor-ligand complexes, the 

25 majority of potential hydrogen bonding groups on the ligand are satisfied either by 
direct contacts with the protein, or by water. Our design method therefore explicitly 
demands that all possible hydrogen bonding groups on a ligand be satisfied by 
hydrogen-bonding partners (contributed by side chain or main chain, or by explicit 
modeled solvent molecules). This requires specialized treatment in the design 

30 algorithms (see below). 
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Amino Acid Rotamer Libraries 

Amino acid rotamer libraries contain descriptions (e.g., Cartesian coordinates) 
of all the amino acid side-chain conformations used in the calculations. Typically 
"rotamers" refer to the side-chain conformations corresponding to local minima (44, 
5 45). In a preferred embodiment, we use such libraries (45) as starting points that we 
augment by adding in side-chain conformations that represent not only the local 
minima, but all energetically allowed conformations near these minima. 



II. The Calculation 

10 The calculation can take the components described above, and run the 

following: 

1. Generation of a Docking Zone (DZ), representing all the degrees of freedom of the 
ligand within the target ligand binding site. 

2. Generation of the Evolving Zone (EZ) by placing amino acid rotamer libraries 
15 within the EZ. 

3. Minimization of the potential function over all the degrees of freedom within the 
EZ and DZ. This procedure produces a single docked ligand conformation, chosen 
from the DZ, and a single amino acid sequence, chosen from the EZ, which 
together correspond to the lowest value of the potential function (the global energy 

20 minimum, GEM), or near-lowest value. Together these represent the best possible 
design (or near best) for the design of the target ligand binding site, within the 
limitations presented by the description of the system (potential function, and 
sampling densities used to generate the amino acid rotamers and the ligand 
ensemble in the docked zone). 

25 4. The GEM can be used to fabricate a single designed protein by experimentation. A 
preferred embodiment is to generate a set of designs that constitute "nearby" 
solutions to the GEM. 
5. The well set is then ranked according to a fitness metric which may or may not 
correspond to the potential function that was used to generate the GEM (i.e., it may 

30 be another potential function or a different combination of potential functions). 
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Generation of the Docking Zone 

Generation of the Docking Zone is preferably divided into the following: 

1. Replacement of the residues in the PCS with poly-alanine or poly-glycine, thus 
truncating the side chains and effectively removing their identity prior to choosing 

5 the newly designed sequence. 

2. Generation of all the internal degrees of freedom within the ligand (internal ligand 
ensemble, ELE). 

3. Generation of all the allowed rotational and translational degrees of freedom of the 
ILE placed within the confines of the target ligand-binding site (the placed ligand 

10 ensemble, PLE). 

The ELE is generated from the initial model of ligand structure by sampling of 
internally rotatable bond dihedral angles according to a molecular mechanical potential 
function, and can be performed using either a deterministic or stochastic search 
procedure. 

15 Search procedures used for generation of the ILE may be: 

• conformational enumeration (deterministic), whereby the ensemble of ligand 
conformations is determined by enumeration of possibilities according to a 
discretization of the total allowable range of each rotatable dihedral (internal 
rotatable bonds have been sampled according to: hydroxyl, 360°, 3 intervals; 

20 carboxylate, 40°, 10 intervals) and 

• Metropolis Monte Carlo search(46) (stochastic), whereby ligand conformations are 
sampled according to a random walk (both the hydroxyl and the carboxylate 
rotatable bonds were sampled over a 360° interval, with moves being made to the 
internal steric interactions), using an energy-based decision criterion to accept or 

25 reject proposed conformations. 

Additional ligand conformations can be obtained by sampling alternative values 
for bond length and angles, as well as ring puckers, alternate protonation states and 
partial charge sets, and low-barrier stereochemical inversions, such as at atoms with an 
open coordination shell. 
30 The PLE can be generated in accordance with the following: 

1. Generation of all the molecular rotations of the ELE (the rotational ligand ensemble, 
RLE). 
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2. Generation of the molecular translations of the ELE (the translational ligand 
ensemble, TLE). 

3. Confinement of the RLE and the TLE to the target ligand-binding site. 

4. Removal of all the ligands generated in stages 1-3, that make unfavorable 
interactions with the protein matrix surrounding the DZ. 

5. Although each stage can be executed separately, for reasons of computational 
efficiency, a preferred embodiment is to combine all four stages into one. 

6. The RLE is generated as a discrete subset of the group of rotations of a three- 
dimensional object. The construction of this subset of rotations is preferably 
performed using any of several methods such as, for example: 

• Using the Eulerian angle description of the rotation group (47), discrete rotations 
are constructed by sampling each Eulerian angle in its interval, according to a user- 
specified coarseness, with sampling of the second Eulerian angle weighted 
according to the sine of the first Eulerian angle, to avoid over-sampling near the 
polar regions of the rotation group. 

• Using the quaternion description of the rotation group (48), discrete rotations are 
constructed by mean square distance minimization (thus choosing a well-dispersed 
subset of the group of all rotations), with each member of this subset corresponding 
to an individual ligand rotation. 

The TLE is generated by constructing a discrete set of points in the protein 
binding site, corresponding to potential positions of the center-of-mass of the target 
ligand. This discrete set of positions of the ligand center-of-mass together comprises 
the "docking grid" term. Generally, a cubic lattice of points is placed in the protein 
binding site, with user-specified rectangular lengths and lattice spacing, and the 
docking grid is taken as that subset of points which satisfy a user-specified minimum 
distance to the truncated protein scaffold. The docking grid can be modified to reflect 
properties of the target ligand (e.g., size, shape). 

The combined RLE and TLE (docked ligand ensemble, DLE), thus constituting 
all possible rotations and translations of the ELE, and thus together comprising all 
possible compatible poses of the target ligand within the design scaffold, are 
constrained to the target ligand-binding site by placing a three-dimensional convex 
polyhedron around the target ligand-binding site and confining all or a fraction of the 
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atoms of each member of the DLE to lie within the polyhedron. A preferred embodi- 
ment is to use a convex hull construct (49): This convex hull can be based on various 
objects, including the C a carbon atoms of the PCS, or the van der Waals surface of the 
original ligand. The size of the convex hull can be adjusted by isometric expansions or 
contractions. 

Generation of the Evolving Zone 

Generation of the Evolving Zone involves placement of amino acid rotamer 
libraries at each of the residue positions in the EZ, and removing those members of the 
rotamer library so placed, that form interactions with the surrounding protein matrix, 
which exceed some threshold value (defined by the user) of the potential function. The 
rotamer libraries can contain representations of amino acids in various combinations: 

• mutation to any of the twenty naturally-occurring amino acids. 

• mutation to any of a subset of the naturally-occurring amino acids. Typical subsets 
of amino acids constructed include, but are not limited to: 

o all amino acids with hydrophobic side chains. 

o all amino acids with hydrophilic side chains. 

o all amino acids except proline, cysteine, and glycine. 

• mutation to any set of amino acids, including any or all of the naturally-occurring 
amino acids, and also including a set of non-naturally-occurring amino acids, 
including, but not limited to, amino acids resulting from the reaction of cysteine 
with a thiol-reactive group. 

• sampling side-chain conformation, with preservation of amino acid identity (i.e., 
allow the structure of a single amino acid side chain to vary in the course of the 
calculation). 

• preservation of amino acid identity and side-chain conformation (i.e., a single fixed 
structure). 

Typical combinations of allowed degrees of freedom for the PCS, SCS, and 
TCS include, but are not limited to: 

• PCS allowed to mutate to all naturally-occurring amino acids; SCS, TCS fixed. 

• PCS allowed to mutate to all naturally-occurring amino acids; SCS allowed to alter 
side-chain conformation; TCS fixed. 
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• PCS, SCS allowed to mutate to all naturally-occurring amino acids; TCS fixed. 

• PCS, SCS allowed to mutate to all naturally-occurring amino acids; TCS allowed to 
alter side-chain conformation. 1 : 

The endpoint of a receptor design calculation consists of a set of individual predicted 
5 modes of ligand binding, each associated with a set of mutations to the design scaffold 
predicted to provide a complementary protein surface to facilitate ligand binding. 
Preferred are two distinct methods for the discovery of these individual ligand pose- 
protein sequence combinations: 

(i) the method of enumeration of complementary protein surfaces for a discrete and 

10 representative subset of the DLE, thus approximating all possible poses of the ligand in 
the target binding site or 

(ii) the method of simultaneous ligand-protein optimization, whereby the DLE (all 
ligand degrees of freedom) is treated as a super-rotamer, akin to the amino acid side- 
chain rotamer degrees of freedom at the positions of the protein. 

15 

These two sequence design methods are described below; the method of 
representative subset enumeration is a preferred embodiment for sequence design. 

Sequence Design: 1. The Representative Subset Enumeration Approach 
20 Given the ligand structures in the DZ (together constituting the DLE), and 

amino acid side-chain structures in the EZ, each generated in the stages described 
above, the global energy minimum (or approximation thereof) is identified in two 
stages: 

1, Generation of compatible ligand poses (CLIPs). 

25 2. Sequence optimization (the INTERFACE procedure) in the EZ for each CLIP. 

A CLIP is a single ligand conformation ("pose") docked into the target ligand-binding 
site; together the CLIPs constitute a representative subset of all DLE members. For 
each such conformation, a design calculation is carried out in which a single EZ 
sequence corresponding to the GEM or aGEM (approximate global energy minimum) 

30 is identified in the INTERFACE procedure; these GEM (aGEM) values are local to the 
CLIP under consideration (the CLIP GEM, cGEM). This approach is essentially an 
enumeration of the EZ GEMs (aGEMS) for all the CLIPs. This representative enume- 
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ration is a preferred embodiment of the sequence design algorithm, because it allows 
the critical and specialized hydrogen-bond inventory (as well as other) constraints to be 
applied to the design process (see below). 

5 Generation of CLIPs 

In a typical calculation, the size of the DLE is too large for enumeration of each 
member in the ensemble by the INTERFACE procedure in a finite time. Consequently, 
a representative subset is chosen, the CLIPs (in the limit, the set of CLIPs is the same 
as the DLE). The CLIPs are chosen by rank-ordering the DLE according to the inter- 

10 action energy between each ensemble member and the scaffold in the truncated target 
site form (the scaffold interaction energy, E s ). The E s term consists of van der Waals, 
hydrogen bonding and electrostatics components, each of which can either be included 
or omitted, as the user desires. For a given form of E s , the DLE can be rank-ordered 
according to Es itself, or the absolute value of E s . In the former case, the top-ranked 

15 DLE member represents the ligand pose that has the most favorable interactions with 
the truncated design scaffold; in the latter case, the top-ranked member corresponds to 
the ligand that has the least interactions (favorable or otherwise) with the scaffold. Both 
rankings are equally valid. In addition, a differentness metric can be applied to 
members of the DLE, in order to generate a set of CLIPs that together represents all 

20 possible compatible ligand poses. In its simplest implementation, the differentness 
metric takes the form of insisting that each member of the TLE (the "docking grid") 
contribute a docked ligand pose to the set of CLIPs. In more complex implementations, 
the DLE members can be assayed for degree of pairwise overlap, with "overly similar" 
DLE pairs prevented from simultaneously existing in the ensemble of CLIPs. 

25 

The INTERFACE Procedure 

The INTERFACE procedure identifies protein side-chain sequences and 
structures of the binding-site residues which are determined to be compatible both with 
individual ligand poses and the protein scaffold. In practice, this is performed by 
30 finding protein sequences and structures which minimize a semi-empirical potential 
function describing the interactions between the components of the biomolecular 
system (protein and ligand), with treatment of the ligand and its interactions as a 
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privileged component. The INTERFACE procedure employs a cycle between a 
computational search strategy to identify protein sequences predicted to minimize the 
potential of the entire biomolecular system, and specialized sequence design algorithms 
(the INCREDIBLE algorithms) to identify and eliminate particular side-chain 

5 structures incompatible with a well-formed interface between protein and ligand, for 
example, those side chains whose presence results in unsatisfied ligand hydrogen- 
bonding potential, or the disruption of the lock-and-key fit between protein and ligand. 

The sequence design algorithms can be any one that has been developed for 
sequence optimization (these can be stochastic or deterministic) which include, but are 

10 not limited to: 

• Simulated Annealing algorithms for sequence design (50) (stochastic). 

• Monte Carlo search algorithms for sequence design (51) (stochastic). 

• Genetic Algorithms for sequence designs (52) (stochastic). 

• Dead-end elimination (DEE) algorithms for sequence design (16, 53) 
15 (deterministic). 

• FASTER algorithms (54) (deterministic/stochastic). 

• Enumeration algorithms for sequence design (55) (deterministic). 

In a preferred embodiment, we use a combination of DEE and FASTER 
algorithms, which together with the INCREDIBLE algorithms, designs a highly 
20 complementary surface to an individual CLIP pose. 

The INCREDIBLE Algorithm 

The INCREDIBLE (INCompatible Rotamer Elimination for the Design of 
Interfaces and Binding of Ligands), algorithms captures critical aspects of molecular 

25 recognition, such as the lock-and-key steric complementarity between protein and 
ligand (56), and the satisfaction of the hydrogen bond inventory of the ligand (IS), 
which are deemed to be more important to successful interface design than is the value 
of the overall molecular potential (which can include interactions distal from the 
ligand). Each of the INCREDIBLE algorithms employed in a calculation is applied 

30 iteratively as the sequence design algorithm converges, in stages, towards an energy 
minimum of the entire biomolecular potential of the system. The INCREDIBLE 
algorithms function to drive the designed protein sequence towards solutions which 
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optimize characteristics of the immediate receptor-ligand interface, as opposed to those 
designed sequences many of whose favorable interactions are not between protein and 
ligand. Any quantitative characteristic of the receptor-ligand interface can be employed 
to drive an INCREDIBLE algorithm, although there are two preferred embodiments: 
5 1 . The "hydrogen bond inventory" of the ligand. In this 1 INCREDIBLE algorithm 
implementation, the sequence design algorithm is guided into any subset of 
sequence space which can be determined to be that most likely to completely or 
maximally satisfy the "hydrogen bond inventory" of the target ligand, i.e., all ligand 
hydrogen bond donors and acceptors. In this manner, designed sequences which 
10 form some favorable interactions but fail to satisfy the hydrogen bonding capacity 
of the ligand (a critical component of a well-formed interface), are iteratively 
pruned from the available sequence space, thus ensuring ligand hydrogen bond 
inventory satisfaction, regardless of the other components of the overall 
biomolecular potential function. In the standard implementation of this 
15 INCREDIBLE algorithm, if at any point during the sequence optimization, it can be 
determined that all remaining side chains which satisfy a particular ligand hydrogen 
bond arise from the same protein position, then all non-hydrogen-bonding side 
chains at this position are eliminated from the sequence space. This ensures that the 
designed protein sequence satisfies this element of the ligand hydrogen bond 
20 inventory. 

2. The elimination of cavities from the designed receptor-ligand interface (the 

"binding surface inventory"). The implementation of this INCREDIBLE algorithm 
is similar to that for the ligand hydrogen bond inventory. If, at any point in the 
complementary surface optimization, it can be determined that a particular and 
25 substantive portion of the ligand binding surface can be in close association 

("binding surface satisfaction") with only protein side chains arising from a single 
residue position, then all side chains which do not satisfy this binding surface 
("cavity-forming" side chains) at this position are eliminated. 



30 



Sequence Design: 2. the DLE Super-Rotamer Method 

In an alternative to the method of CLIP representative subset generation, the 
problems of ligand pose placement and protein sequence design can be combined, with 
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the resulting GEM or aGEM thus constituting a ligand pose and an associated protein 
complementary surface, which is deemed to be the best possible (or near best) design 
for the ligand binding site, as determined by the value of the design potential for the 
ligand-protein system. The DLE super-rotamer method is incompatible with the 
5 INCREDIBLE algorithms, however, which are an important driving force for 

optimization of the immediate receptor-ligand interface. It is for this reason that the 
CLIP representative subset generation method is a preferred embodiment for generation 
of the initial family of receptor-ligand designed interfaces. 

10 Generation of Well Sets 

Although the sequences corresponding to the GEM or aGEM of the system are 
invaluable reference points in the design procedure, it is typically necessary to identify 
other sequences that are closely related either in sequence space (e.g., single point 
mutations or combinations thereof), or in energy space (e.g., within an interval AE we u of 

15 the GEM or aGEM of the entire system); such sequences are designated by the "well 
set" term. The generation of well sets has two functions: a) it provides a set of plausible 
designs for empirical evaluation which mitigates prediction inaccuracies and b) it 
allows potential functions other than the one used to generate the GEM (aGEM) or the 
well set to be used (see description of the LORD procedure below). Of particular value 

20 is to generate a well set that falls within AE^ of the GEM or aGEM, and then to rank- 
order these according to some evaluation criteria other than the original potential 
function. 

Well sets can be generated by the following: 
L Use all the cGEMs as a well set. 
25 2. Stochastic or deterministic generation of well sets from the GEM, aGEM, or from 
cGEMs, using the OVERLORD procedure (Optimize, Vary, & Explore Related 
sequences with the LORD procedure) described below. 

Ranking Wells: the LORD Procedure 
30 Well members can be ranked according to the potential function used in the 

calculation. However, a more typical ranking method is to use descriptors that are more 
sophisticated than the potential function used to generate the well members in the first 
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place. This is performed by the LORD (Linear Optimization of Ranking Descriptors) 
procedure, using ranking descriptors that are intended to be a more realistic evaluation 
of the quality of a ligand-protein interface, and can differ greatly in functional form 
(typically not pairwise-decomposable, as is the design potential) and ease of 
computation (typically more time consuming) from the semi-empirical design potential. 
Ranking descriptors employed in the LORD procedure may include, but are not limited 
to: 

• value of the semi-empirical design potential restricted to the immediate receptor- 
ligand interface 

• value of the semi-empirical design potential for the entire designed protein 

• number of unsatisfied hydrogen-bonding atoms in the ligand 

• number of unsatisfied hydrogen-bonding atoms in the PCS 

• exposed solvent-accessible surface area (SASA) of the ligand 

• total volume of any cavities in the ligand-protein interface 

• total enthalpy of all hydrogen bonds between protein and ligand 

• steric complementarity of ligand and protein, as determined by: 
o total van der Waals interactions 

o complementary interaction surface area (57) 
o Voronoi tessellation (58) 

There are two forms of the LORD procedure: 

1. Protein sequences are chosen from the set of all well members, which 
simultaneously score well according to each ranking descriptor, to a user-specified 
extent for each ranking descriptor (either by restricting the analysis to those well 
members which score in some top fraction for each ranking descriptor, or which 
have a value of a ranking descriptor less than some absolute value, typically in the 
case of the unsatisfied hydrogen bond descriptor). All well members which thus 
perform satisfactorily well according to every ranking descriptor are finally rank- 
ordered according to a user-specified ranking descriptor deemed to be the most 
indicative of the quality of the receptor-ligand interface, with this rank-ordered list 
being submitted to further analysis. 

2. Any combination (linear or otherwise) of existing ranking descriptors constitutes a 
further ranking descriptor, which captures aspects of its component descriptors. 
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This is most useful when a large database of designed receptors have been 
characterized both in silico and in vitro. In this instance, a quantitative structure- 
activity relationship (QS AR) can be constructed to postdict the experimentally 
determined performance of each receptor (ligand binding affinity, ligand binding 
5 specificity, receptor stability) in terms of the ranking descriptors computed for that 
receptor and receptor-ligand interface. In this manner, a novel ranking descriptor of 
maximal correlation is constructed against the experimental data. This "semi- 
empirical" ranking descriptor can then be used in further design of receptors for the 
same ligand, similar ligands, or even structurally and chemically diverse ligands. 

10 

Ranking Descriptors not Based on the Semi-Empirical Force Field 

Many ranking descriptors are obtained by application of the semi-empirical 
design potential (or particular components) to a subset of the system, particularly the 
receptor-ligand interface. Some, however, are of a different nature: 
15 • The solvent-accessible area (S AS A) of the target ligand within a designed interface 
can be computed by the Connelly surface area algorithm with a probe radius of 1.4 
A. The S ASA of the target ligand is computed within the designed well member 
complementary surface, using a full hydrogen model. 

• A ranking descriptor which describes cavities between protein and ligand is also 
20 commonly employed. A cubic lattice of grid points of user-specified rectangular 

lengths and grid spacing is placed around the ligand in the well member binding 
site. Each of these points is queried for distance to ligand, protein, and bulk solvent. 
Those points which are sufficiently distant from protein and ligand to represent 
electron density coverage of either (typically set at 1 A), but simultaneously 
25 sufficiently close to prevent explicit solvent molecule entry (typically set at 1.5 A), 
are deemed to constitute a cavity between protein and ligand. This set of "cavity 
points" is converted to a "cavity volume" which is used as a ranking descriptor. 

• An independent estimator of ligand affinity can be used as a ranking descriptor. 
This can take the form of an external software package, e.g., a quantum mechanical 

30 program with ligand affinity estimation capability. 

• When the designed complementary surface is intended to be catalytically active 
(i.e., an enzyme design calculation), any estimator of reactivity of the ligand 
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(substrate)-complementary surface pair can be employed as a ranking descriptor. 
This can consist of any prediction of pK a or electron localization for predicted 
active set residues, or any external software package for the modeling of protein- 
substrate reactivity. 

Generation of Wells: the OVERLORD Procedure 

This "well exploration" can be performed by any computational search strategy 
(deterministic or stochastic), with a preference for Monte Carlo-based stochastic search 
techniques (51), or a search algorithm based on either the DEE (24, 59) or the FASTER 
computational search strategy (54): 

• In a typical Monte Carlo stochastic search, random steps in sequence space 
(typically point mutations) are taken around OEMs or aGEMs to generate an 
ensemble of well member sequences. Moves in sequence space are typically 
accepted according to a probability which decreases according to the size of the 
potential energy increase. Multiple, independent random walks can be initiated 
around a given GEM or aGEM, with the resulting sequences wells being collated. 
Well member sequences can additionally be constrained to lie within a fixed AE we n 
potential energy difference from the initial GEM. 

• The DEE algorithms (24, 59) can also be used with a fixed, positive value of AE we n 
to eliminate individual rotamers which can provably not be a member of any 
sequence within AE we ii of the GEM. Any remaining sequence space can be explored 
by enumeration or a tree search method to construct well members. 

• A modification of the FASTER algorithms (54) which combined perturbation, 
relaxation, and random mutagenesis can be used to construct well members. In this 
search strategy, the initial GEM sequence is subjected to iterative rounds of random 
mutagenesis (a user-specified number of point mutants), followed by a standard 
implementation of the FASTER algorithms (typically batch relaxation or single- 
residue perturbation/ batch relaxation) to optimize the remainder of the sequence 
not the subject of the random mutagenesis. Multiple, independent trajectories can 
be taken away from the initial GEM, with the results being collated. 



WO 2005/007806 PCT/US2004/0 14395 

28 

Quantitative Structure-Activity Relationships (QSARs) 

QSAR construction is typically performed by single variable, linear regression 
to optimize coefficients of the separate ranking descriptors. (independent variables) to 
maximize the correlation (R-value) of the experimentally determined receptor perfor- 
5 mance (e.g., ligand binding affinity, catalytic rate, other biochemical activities). 

FIELDS OF APPLICATION 

As demonstrated, the computational design methodology is general, and can be 
given any protein structure (or model thereof) and target ligand (small molecule, 

10 protein, nucleic acid, carbohydrate, lipid, metal, or other) as input. Consequently it can 
be used to manipulate or introduce ligand-binding sites in any protein, for any ligand. 
The engineered proteins can be used either as materials ex vivo, taking advantage of the 
specific, high-affinity molecular recognition properties of biomolecular interactions, or 
can be re-introduced into an organism to function as in vivo biologically active 

15 components. 

Nucleic acid encoding protein(s) designed by the invention can be introduced 
by gene transfection, viral infection, or recombination with an endogenous gene. It can 
interact with an endogenous pathway (e.g., receptor) or a pathway with one or more 
exogenous components (e.g., kinase, phosphatase, other enzyme, channel or 

20 transporter). The organism may be microbial (e.g., archaebacterium, eubacterium, 
fungus, virus), animal, or plant. A DNA or RNA vector comprised of a nucleotide 
sequence encoding the protein(s) and one or more regulatory regions (e.g., constitutive 
or inducible promoter; other regions which regulate transcription, translation, or 
replication) may be used to transfer and/or to express sequences. 

25 The protein may be chemically synthesized, in vitro transcribed/translated (e.g., 

cell-free systems, reticulocyte lysate), or expressed in a cultured cell or organism. One 
or more non-natural residues may be substituted for an amino acid residue of the 
protein by chemical synthesis or elongation with an artificially charged transfer RNA. 
One or more non-natural side chains may also be incorporated into the protein in this 

30 manner. Protein may also be post-translationally modified. Therefore, the chemical 

properties of a side chain or its geometric positioning in the protein may be determined 
by a structure other than the 20 natural amino acid residues. The protein may be 
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comprised of the mature amino acid sequence (see Tables 1, 3 and 5) as well as other 
protein domains (e.g., a signal peptide which causes secretion, another cell localization 
signal, an anchor peptide which is membrane inserted, an affinity peptide for 
purification). Synthetic peptide cleavage signals may be inserted between such domains 
5 to produce mature protein by proteolysis. Protein may be purified by biochemical 
procedures known in the art: centrifugation, chromatography (e.g., affinity, ion 
exchange, gel sizing, hydrophobic/hydrophilic interaction), electrophoresis, and 
precipitation. 

The protein designs obtained by the invention may be used as a library of amino 
10 acid sequences prior to confirmation of binding to ligand or an analog thereof. For 

example, the library may be used with or without other sequences in a gene shuffling or 
directed/random evolution process to provide improved proteins whose binding activity 
is then confirmed. The high efficiency of the invention in designing protein with 
binding activity may provide one or more potential mutants which can be further 
15 manipulated without experimentally confirming that they bind ligand. Alternatively, 
confirmation of binding may be performed with an analog of the ligand which is bound 
(e.g., PMPA in Example 2) or the reactive substrate or product of an enzyme (e.g., 
DHAP and GAP in Example 3). 

The protein may be designed with more than 10, more than 15, more than 20, 
20 more than 25, or more than 30 changes in the amino acid sequence as compared to the 
starting protein for which a structure has been determined or is predicted. Thus, the 
structure of a protein may be used to predict the structure of a mutant or analog thereof 
which is the basis for a new protein design. The ligand may bind protein with at least 
micromolar, at least nanomolar, or at least picomolar affinity. For a protein with 
25 catalytic activity, a rate enhancement of at least 10 3 -fold, at least 10 4 -fold, at least 10 5 - 
fold, or at least 10 6 -fold over the uncatalyzed reaction is preferred. 

The scope of potential applications of this method is large, encompassing any 
field that takes advantage of receptor-ligand interactions, including, but not limited to: 
• The construction of biosensors (ex vivo or in vivo), in which the (re-)designed 
30 protein functions as a molecular recognition element for an analyte and is coupled 
to a signal transduction mechanism that couples ligand binding to a readout signal 
that can be utilized in a detector (6, 67, 68). 
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• Affinity purification reagent (ex vivo), in which the (re-)designed protein functions 
as a molecular recognition element that preferentially binds a molecule in a 
mixture. 

• Chiral purifications (ex vivo), in which the (re-)designed protein functions as a 
molecular recognition element that preferentially binds one stereoisomer over 
others (10, 69). 

• Synthetic signal transduction pathways (in vivo) in which (re-)designed receptors 
mediate a biochemical response to a ligand (70) (agonist or antagonist). 

• Synthetic genetic circuits (in vivo) in which (re-)designed proteins mediate the 
ligand-dependent action of a genetic control element (1, 71, 72) (including but not 
limited to repressor or activator proteins). 

• (Re-)Design of allosteric regulator elements in enzymes, receptors, or DNA-binding 
protein, in which the binding site is structurally, thermodynamically and kinetically 
coupled to another site (or multiple other sites) such that binding of a ligand at the 
(re-)designed site alters the activity at the other site(s). 

• Synthetically controlled metabolic pathways (in vivo), in which an enzyme with an 
engineered allosteric control element is used to control the flux of metabolites 
through a pathway. 

• Enzyme redesign to alter the binding specificity of a known enzyme active site. 

• Enzyme design in which a new catalytically active site is constructed (73). 

The range of ligands that can be addressed by the computational design 
algorithms described here include, but are not limited to: 

• Toxins, including but not limited to: 
o Chemical warfare agents 

o Biological warfare agents 
o Industrial pollutants 
o Pesticides & herbicides 
o Carcinogens 
o Neurotoxins 

• Explosives 

• Metabolites 

• Drugs and drug precursors 
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• Neurotransmitters 

• Disease state indicators 

• Chiral fine chemicals 

• Precursors and components in the stages of a (bio-)chemical synthesis 

The range of proteins that can be used as scaffolds for the computational design 
algorithms described here include, but are not limited to: 

• The family of bacterial periplasmic binding proteins (PBPs), including but not 
limited to the Gram negative receptors for amino acids, carbohydrates, cations, 
anions, and vitamins. 

• The superfamily of proteins containing the PBPs, including but not limited to the 
eukaryotic glutamate receptors, transcription factors including lad, enzymes such 
as cyclohexadienyl dehydratase (74). 

• The superfamily of nuclear metabolite receptors, including but not limited to 
receptors for hormones, vitamins, xenobiotics, and fatty acids (75). 

• Proteins with multiple, allosterically-coupled, binding sites. 

• Antibodies (76). 

• Beta-clamshell proteins, such as olfactory proteins (77). 

• The family of cytoplasmic, antiparallel (3-barrel ligand-binding proteins, such as the 
fatty acid binding proteins (78). 

• Proteins which function as members of enzymatic pathways, whereby redesign of a 
binding site allows for the creation of pathways with novel functionalities. 

Biosensors 

At the molecular level, biosensors combine molecular recognition with trans- 
duction of a ligand-binding into a detectable physical signal that can be utilized in the 
construction of a device for the detection of the analyte (6). Biosensors can utilize any 
protein that binds a ligand including, but not limited to enzymes, receptors or anti- 
bodies. Signal transduction can take place entirely in vitro by integrating the molecular 
recognition element into a physical device (67), or it can be cell-based (68) in which the 
molecular recognition element controls a biochemical or genetic response. The compu- 
tational design process described here can be used to construct the molecular recog- 
nition element in such biosensors. An advantage of the invention is that by suitable 
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attachment of a. reporter group in the hinge of PBP (or an allosteric movement of the 
receptor in response to binding of ligand), no addition of exotic reagents is need to 
generate a signal 

An example of the utility of the computational design methodology is afforded 
by the redesign of the PBPs to bind target ligands unrelated in structure to the natural 
ligand. The PBPs have been engineered to couple ligand-binding events to changes in 
fluorescence (7, 34, 79, 80) or redox activity (81), by coupling fluorophores or redox 
reporter groups respectively at locations where these reporter groups are sensitive to 
ligand-mediated hinge-bending motions that typify this protein superfamily. These 
engineered proteins therefore function as reagentless optical or bioelectronic sensors for 
the ligands to which they bind. This reagentless coupling mechanism is maintained 
even upon drastic redesign of the ligand-binding sites (11, 34). Consequently, the 
computational design methodology described here enables families of biosensors to be 
engineered for any ligand that can be accommodated in such PBPs. 

Potential applications for engineered biosensor proteins include but are not 
limited to: 

• Food processing management. 

• Detection of pollutants and toxins as an initial stage in bioremediation. 

• Detection of explosives, chemical threats, and biological threats for purposes of 
homeland security and weapons inspection. 

• Detection of disease state indicators and metabolite concentration determination for 
o Real-time health monitoring. 

o Basic biomedical research, such as the detection of particular metabolites 
(metabolomics) or signal-transduction intermediates. 

• Drug detection and drug concentration determination for purposes of: 
o Monitoring drug administration regimens. 

o Detection of banned substances. 

o Determination of individual pharmacokinetic response. 

• Detection of final product and precursors of the synthesis of: 
o Pharmaceuticals. 

o Fine chemicals. 
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• Detection of enantiomers and diastereomers, particularly of pharmaceuticals and 
fine chemicals, which proves difficult by traditional chiral separation techniques. 

. Affinity Purification 

5 Proteins that have been engineered by the computational design methodology 

described here to bind preferentially a particular molecule can be used to selectively 
purify or deplete that molecule from a complex mixture by affinity chromatography. In 
this method, the engineered protein is immobilized on a solid support. Upon exposure 
of this derivatized support to a complex mixture, the molecule of interest will be 

10 selectively adsorbed onto the matrix. Such a matrix can be used either in batch 
purification (matrix is mixed with mixture, and allowed to settle out) or in column 
chromatography (matrix is confined, and mixture is flowed through). This affinity 
chromatography methodology can be used to purify molecules from a complex mixture 
such as multiple products obtained in a chemical synthesis. The methodology can also 

15 be used in detoxification using the matrix to deplete a toxic molecule from a mixture. 
Solutions of interest for such detoxifications include, but are not limited to, drinking 
water or blood. 

Signal Transduction Pathways 

20 The control of cellular physiology and gene transcription in response to extra- 

cellular or intracellular signals is a fundamental property of living systems. Such 
responses are mediated by complex pathways that are initiated and regulated by ligand 
binding to receptor. An illustration of this is afforded by the demonstration that 
redesigned PBPs can control signal transduction pathways that respond to the target 

25 ligands upon re-introduction into E. coli (see below). 

Such synthetic signal transduction pathways can be used to engineer cells, 
tissues, and whole organisms in principle to link any input to any output. Applications 
include, but are not limited to: 

• Cell-based biosensors by coupling the input to changes in an electromagnetic (e.g., 
30 current, voltage, frequency) or optical (e.g., intensity, wavelength, polarization) 

signal readable as a detectable output (e.g., colored light, fluorescence). 
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• Cell-based bioremediation by coupling the input to production of enzymes to 
degrade the target ligand(s). 

• The engineering of smart therapeutic cells, by coupling the input to the production 
of repair enzymes, or agents that kill pathogens or cancerous cells, or to the 

5 secretion of a therapeutic molecule, such as a small organic molecule (e.g., drug), 
hormone (e.g., insulin), or immunoregulatory molecule (e.g., cytokine). 

• Induction of differentiation such as the production of fruiting bodies in response to 
an external ligand. 

10 Chiral Purification 

A special case of affinity purifications described above is that of chiral 
purifications. Many molecules possess asymmetric centers. Consequently, molecules 
can exist in multiple, structurally distinct forms (stereoisomers). This asymmetry is of 
particular importance in living systems, since proteins typically interact with one 

15 stereoisomer only. Consequently, for many drugs, only one stereoisomer (the 
"eutomer") exhibits the desired pharmacological activity, whereas the other 
stereoisomer(s) (the "distomer(s)") are either inactive, or associated with side-effects 
(10). Chiral purification of drugs is therefore of great importance for safe 
administration, and nowadays is mandated by the U.S. Food and Drug Administration 

20 (82). The importance of chirality applies not only to drugs, but to most complex 

chemical materials. The computational design technique can be used to design proteins 
that bind one stereoisomer preferentially over another. 

Such chiral purifications are illustrated by design GBP.G1, a GBP variant 
designed to bind L-lactate (Table 1). This designed receptor differentiates between L- 

25 and D-lactate (Table 2). Separate columns were prepared with wild-type GBP and 
GBP.G1 respectively covalently coupled to the resin. A racemic mixture of L- and D- 
lactate was applied to each column, and the eluate assayed optically for lactate. The 
designed receptor cleanly separates the two enantiomers, whereas the wild-type protein 
does not. 
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Design of Enzymes 

With the input of a transition state model for a particular catalytic conversion as 
the target ligand, a Receptor Design calculation allows for the construction of proteins 
predicted by the "transition state stabilization" theory of catalysis (IS) to function as 
enzymes (e.g., oxido-redutases which catalyze oxidation-reduction reactions, 
transferases which catalyze transfer of functional groups, hydrolases which catalyze 
hydrolysis reactions, lyases which catalyze additions to a double bond, isomerases 
which catalyze isomerization reactions, and ligases which catalyze formation of bonds 
with ATP cleavage) catalyzing that particular molecular conversion. Kinetics of the 
enzyme, its substrate and/or cofactor specificity, and inhibition can be changed. More 
generally, the Receptor Design algorithm can be used in conjunction with other 
computational techniques, such as the "site search" method of geometric optimization 
(85) or a quantum mechanical design methodology. After positioning of the catalytic 
active site residues by one of these methods, the Receptor Design algorithm can be 
employed to design the remainder of the complementary surface in the active site. 
Optionally, directed or random mutagenesis methods (i.e., site-directed mutagenesis, 
error-prone polymerase, gene shuffling, directed evolution) may be used after design of 
the ligand-binding site and/or catalytically-active site to improve binding affinity, 
catalytic rate, enzyme turnover, protein stability, or a combination thereof. 

EXAMPLE 1 

The computational design method described above has been reduced to practice 
in a specific embodiment (Fig. 1) in operational computer programs (ReceptorDesigner 
programs that form a component of the DEZYMER suite) and experimental validation 
of designs generated by the ReceptorDesigner programs. 

The Receptor Design procedure was used to engineer TNT, L-lactate, D-lactate, 
or serotonin binding sites in place of the wild-type sugar or amino acid ligands of five 
members of the Escherichia coli periplasmic binding protein (PBP) superfamily (60), 
using the high-resolution three-dimensional structures of the closed conformation of 
these proteins complexed with their wild-type ligand as starting points for the 
calculation (Fig. 2A): glucose-binding protein (GBP) (61), ribose-binding protein 
(RBP) (62), arabinose-binding protein (ABP) (63), glutamine-binding protein (QBP) 
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(64), and histidine-binding protein (HBP) (65); the PDB database lists the structures 
and wild-type amino acid sequences as 2GBP (SEQ ID NO:l), 2DRI (SEQ ID NO:2), 
1 ABE (SEQ ID NO:3), 1WDN (SEQ ID NO:4), and 1HSL (SEQ ED NO:5) 
respectively. These periplasmic proteins are synthesized as precursors consisting of a 
5 signal peptide and the mature amino acid sequence provided herein. The variation in 
structure and sequence (60) of these proteins presents distinct starting points for the 
design calculations. The three target ligands selected for this study bear little 
resemblance to the wild-type, cognate ligands of the chosen PBPs, are chemically 
distinct from each other, and in one case (TNT) represent a non-natural molecule. The 

10 designs therefore explore critical parameters of molecular recognition, including 
molecular shape, chirality, functional groups (hydrogen bonding: nitro (acceptor), 
hydroxyl (donor and acceptor), carboxylate (acceptor); molecular surface: polar, 
aliphatic, aromatic), internal flexibility (TNT < L,D-lactate < serotonin), charge (TNT: 
neutral; L,D-lactate: anionic; serotonin: cationic), and water solubility (TNT < 

15 serotonin < LJD-lactate). 

Complementary surfaces were designed for TNT in RBP, ABP, and HBP; for L- 
lactate in ABP, GBP, RBP, HBP, and QBP; for D-lactate in GBP; and for serotonin in 
ABP (Fig. 3; Table 1). The designed surfaces are electrically neutral for TNT, 
positively charged for lactate, and negatively charged for serotonin. Hydrophobic 

20 groups of all three target ligands interact primarily with aliphatic side chains, although 
several examples of aromatic interactions are seen (TNT.A1, TNT.H1, L-Lac.Gl, D- 
Lac.Gl, D-Lac.G2). In one instance, an example of dual aromatic stacking was 
obtained (TNT.R3). In all cases, the hydrogen-bonding potential (donor, acceptor) of 
the functional groups on the ligand is largely satisfied. 

25 Twenty designs predicted by the automated design procedure were selected for 

experimental characterization (Table 1). The predicted mutations (ranging from five to 
seventeen amino acid changes) were constructed by PCR mutagenesis of the wild-type 
receptor scaffold genes (7). Proteins were over-expressed, purified, and modified with 
thiol-reactive styryl dyes conjugated to cysteine residues introduced by mutation at 

30 locations where the fluorescence emission intensity of the dye responds to a ligand- 
mediated hinge-bending motion of the receptor (7). Ligand-binding affinities (Table 2) 
were determined by titration, monitoring ligand-dependent changes in fluorescence 
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emission intensity that were fit to single-site binding isotherms (7) (Fig. 4). In all cases, 
wild-type receptors show no change in fluorescence intensity upon addition of target 
ligands. Conversely, the mutant receptors respond only to target and not wild-type 
ligands. A wide range of affinities is observed, down to the nanomolar level (TNT.R3). 
5 To probe the specificity of interaction, the affinities of a number of closely related 
ligands (Fig. 2B) were also determined (Table 2). The thermostabilities of a 
representative subset of apo-receptors (Fig. 5) showed that cooperative folding 
transitions are retained with a slight loss of stability relative to the wild-type proteins. 
Every designed receptor exhibits detectable affinity for its target ligand. In the 

10 case of the TNT designs, all six receptors can distinguish the absence of a single nitro 
group (2,4- and 2,6-dinitrotoluene), and with the exception of the ABP design, the 
absence of a single methyl group (trinitrobenzene). Introduction of an additional point 
mutation suggested by visual inspection of the model to improve packing is sufficient 
to achieve the desired selectivity in this ABP design (Fig. 6). All ten L-lactate designs 

15 exhibit the desired chiral stereospecificity, selecting L-lactate over both the D-lactate 
enantiomer and pyruvate, the prochiral, oxidized forai of lactate (Fig. 6). Similarly, all 
three D-lactate designs show specificity for D-lactate over L-lactate and pyruvate. The 
single serotonin design shows significantly lower affinity for tryptamine (absence of a 
hydroxyl) and tryptophan (absence of a hydroxyl, presence of a carboxylate). The 

20 relative free energy corresponding to the loss of a hydrogen bond in a decoy ligand (1-5 
kcal/mol; Fig. 6) is consistent with the observed range of weak and strong hydrogen 
bonds (18). The automated computational design procedure therefore reliably predicts 
mutant receptors that attain ligand binding with the desired, drastically altered 
specificity, consistent with correct modelling of critical elements of molecular 

25 recognition: shape, functional groups, and chirality. 

The affinities of the wild-type receptors for their cognate ligands fall in the 0.1 
fiM to 1.5 fiM range (7). Two of the three TNT designs in RBP also fall into this range 
(Table 2); the binding behaviour of these computationally designed receptors is 
therefore indistinguishable from naturally evolved PBPs. It has been observed that the 

30 maximal binding affinity for many ligands is correlated with the number of non- 
hydrogen atoms (66). The affinity of one TNT design, TNT.R3, is 2 nM, corresponding 
to its empirically expected value. The single serotonin design does not attain the 
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expected nanomolar affinity. The affinity of the fully automated design (Stn.Al) is 50 
/xM, and is improved to 4.7 juM by introduction of a single point mutation predicted to 
improve packing interactions, between the receptor and ligand. Several of the lactate 
designs have micromolar affinities (one has slightly sub-micromolar affinity), 

5 approaching the expected maximal value for a six-atom ligand (0.3 juM). 

High-affinity receptors are successfully identified within the top ten ranked 
designs for each ligand, corresponding to a tiny fraction of the available search space. 
Nevertheless, the designs exhibit a significant spread in ligand-binding affinities, both 
for a given ligand in a particular scaffold, and between scaffolds. The likelihood that a 

10 protein scaffold can be mutated to accept a new target ligand ("adaptive potential") is 
also variable. The observed range of affinities can be rationalized with an empirical 
quantitative structure-activity relationship (QSAR) that provides empirically fit weights 
for the DEE force-field components (steric clashes, unsatisfied hydrogen bonds) and 
takes into account additional factors not modelled by the DEE force field (hydrophobic 

15 contact areas, electrostatics, volume ratio of wild-type to target ligands as a measure of 
adaptive potential). This QSAR (Fig. 7) provides direct reciprocity between theory and 
experiment. 

RBP and GBP control chemotaxis of E. coli towards sugars, mediated by a two- 
component signal transduction pathway (83). This response can be reconnected to gene 

20 regulation by constructing a synthetic signal transduction pathway that controls 
transcriptional upregulation of a 3-galactosidase reporter gene (84) (Fig. 8 A). The 
biological activities of the TNT and L-lactate designs in RBP and the L-lactate designs 
in GBP were tested in this pathway, replacing wild-type RBP and GBP with designed 
receptors. Wild-type receptors mediate increases in reporter gene expression in 

25 response to ribose or glucose, but not TNT or L-lactate. Conversely, all the redesigned 
receptors respond to their cognate, but not wild-type ligands. The dose-response curves 
of the TNT-binding RBP receptors follow the same order as the intrinsic ligand-binding 
affinities (Fig. 8B). The redesigned receptors therefore mediate signal transduction to 
extracellular TNT or L-lactate, as intended. 
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EXAMPLE 2 

Another application for the computational design of binding sites is the 
development of biosensors that detect chemical pollutants or threats. PMPA is a 
relatively nontoxic surrogate and the predominant hydrolytic degradation product of 
5 soman, a member of the organophosphate nerve agent family and a potent suicide 
inhibitor of acetylcholinesterase. It degrades rapidly upon exposure to water and forms 
PMPA. PMPA is only found following exposure to soman, and may even be present in 
the leading edge of a nerve agent cloud. Detection of PMPA is therefore important for 
weapons control, post-incident exposure determination and cleanup, and may prove 

10 useful as an attack indicator in a stand-off detector. Neither PMPA nor soman have an 
intrinsic chromophore or fluorophore. Therefore, a reagentless fluorescent biosensor for 
PMPA that responds rapidly and continuously is of great potential benefit for 
monitoring and control of this agent. 

The ReceptorDesign component of the DEZYMER suite was used to generate 

15 designs of mutant receptors. This design process consisted of eight stages (Fig. 10). 
Stage 1: the internal degrees of freedom within the ligand are sampled to identify low- 
energy ligand conformations (the internal ligand ensemble, ILE). A single, minimum- 
energy conformer of the PMPA i?-isomer was used in this study. Stage 2: a rotational 
ligand ensemble (RLE) is prepared in the absence of protein coordinates, sampling 

20 Eulerian rotations around the three principal molecular axes of the ligand (2.5° 
intervals, about 10 6 poses). Stage 3: a pocket for the new binding site is identified, 
using the original ligand to locate the layer of residues that are in direct van der Waals 
or hydrogen bonding contact (the primary complementary surface, PCS). Stage 4: 
residues in the PCS (excepting glycines or prolines) are replaced with alanine, 

25 generating a truncated protein scaffold representing a PCS for which no sequence has 
been determined yet. Stage 5: the RLE is placed on each point of a cubic grid (0.5 A 
spacing) within the convex hull which envelops the ligand van der Waals surface. Stage 
6: a placed ligand ensemble (PLE) is constructed by selecting members from these 
RLEs that are sterically compatible with the truncated scaffold, and confined within the 

30 convex hull (> 90% of ligand atoms). Stage 7: for each of top 10,000 docked ligands 
(selected from the PLE by choosing ligands with the fewest interactions with the 
truncated scaffold) a PCS is calculated. In this calculation, a side-chain rotamer library 
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(an expanded version of (45) containing 6,122 rotamers) representing all possible 
mutations (except cysteine or proline) and side-chain conformations is placed at all 
positions in the PCS, and a sequence corresponding to the global minimum energy of a 
pairwise-decomposed potential function is identified by a'dead-end elimination 
5 algorithm (24). This potential function is based on a semi-empirical force field that 
includes a modified Lennard- Jones potential to represent "fuzzy" van der Waals 
interactions (11, 24, 88) (parameters for amino acids and PMPA taken respectively 
from CHARM22 (43) or a universal force field approximation (42, 89)), an explicit 
geometry-dependent hydrogen-bonding term (11, 24, 88), a continuum solvation term 

10 to represent the hydrophobic effect with terms favoring or disfavoring burial of polar or 
nonpolar groups (11, 24, 88), and a linear term to account for differences in side-chain 
entropy (E s = wRTlnN, where N is the number of free torsions in the side chain, and w a 
weight; typically 1.0). Electrostatic contributions were not included in the calculations. 
The search algorithm maintains the ligand hydrogen bond inventory, selecting 

15 complementary sequences with minimal unsatisfied hydrogen bonds between ligand 
and protein. All PMPA oxygens were classified as hydrogen bond acceptors. Stage 8: 
the predicted designs were ranked by four independent criteria: van der Waals contacts, 
hydrogen bonding energies between protein and ligand, the number of unsatisfied 
ligand hydrogen bonds, and exposed cavities within the binding pocket. Suitable 

20 designs were selected by taking the intersection of the top 10% of each ranked list. This 
linear optimization method optimizes fitness functions with components of different 
magnitudes and ranges. The final choice is based on visual inspection of the molecular 
models. The design algorithm described here includes enhanced ligand sampling 
(stages 5 and 6) and introduction of the final selection by linear optimization (stage 8). 

25 The calculations were parallelized at stages 4 and 6, and carried out on a Beowulf 

cluster of twenty 1.7 GHz processors in about two days per combination of scaffold and 
ligand. 

Mutations were introduced into the RBP and the GBP genes using overlap 
extension polymerase chain reaction (90). A single cysteine was introduced in each of 
30 the constructs (RBP: Cys 265; GBP: Cys 1 12) for covalent attachment of a fluorescent 
reporter (7). Constructs were cloned with a carboxy terminal decahistidine tag in a 
pET21a expression vector using 5' Xbal and 3' EcoRI restriction sites. Mutations in the 
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coding sequence were confirmed by DNA sequencing. Expression of mutant proteins 
was confirmed by MALDI-TOF mass spectrometry. His-tagged protein was purified by 
immobilized metal affinity chromatography on Ni* 4 " matrix and labeled with a reporter 
fluorophore conjugated through a thiol of the cysteine residue introduced near the hinge 

5 by site-directed mutagenesis. For GBP designs, all buffers contained 1 mM CaCl 2 . 

Ligand binding was measured by direct titration into a solution of covalently 
labeled protein (10 nM to 100 nM), and monitoring changes in fluorescence emission 
intensity at 25°C (7). 

The binding pockets of RBP (PDB code: 2DRI) (91) and GBP (PDB code: 

10 2GBP) (92) were redesigned to bind PMPA by the ReceptorDesign component of the 
DEZYMER suite, with eleven and twelve residues forming the primary complementary 
surface (PCS) in each receptor, respectively. The algorithm uses the three-dimensional 
structure of a protein to predict sequences and structures of binding sites that are 
complementary to a docked ligand (Fig. 10). A combinatorial search procedure 

15 simultaneously optimizes sequence choice and ligand docking to identify mutations that 
form complementary surfaces. Three RBP and twelve GBP designs were constructed 
by site-directed mutagenesis and their ligand-binding properties were determined (Figs. 
11-12; Table 3). 

Each design corresponds to a separate PCS and a distinct orientation of the 
20 docked PMPA molecule. In all cases PMPA is sequestered within the binding site, with 
no direct contact with bulk solvent. In the majority of the designs the methyl 
phosphonate group points out towards the solvent. In the case of the PG10 design in 
GBP, however, this group is oriented inwards (Fig. 12). In all designs the hydrogen 
bonding potential of both phosphonate anionic oxygens as well as the phosphoester 
25 oxygen are satisfied. 

The majority of the designs were built in GBP, and were selected from the top 
50 ranked designs (Fig. 11), sampling both low- and high (er)-energy designs. The 
twelve PCS residues of the GBP designs can be divided into three groups according to 
the sequence diversity observed within the family of designs (Table 3): constant (92 l5 
30 152n, 236n), highly conserved (21 l n , 256n), and variable (10i, 14i, 16 h 91 h 154n, 15Sn, 
183u). The constant and highly conserved positions all differ from the wild-type 
protein. Two of the three constant residues arise from a change in function between the 
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designs and the wild-type receptor. In wild-type GBP Lys92i and Hisl52 n form 
hydrogen bonds to glucose. In most designed PMPA receptors Ser92i and Asnl52n do 
not interact with the ligand (in PG12 Asnl52 n forms an additional hydrogen bond with 
PMPA), but participate in a hydrogen-bonding network connecting the N- and C- 
5 terminal domains. This network may function as a "latch" that stabilizes the closed 
form (Figs. 11A-B). The third constant residue (Ala236n) is constrained by steric 
differences between glucose and PMPA. In wild-type GBP, Asp236n forms a hydrogen 
bond to glucose; in all designs the PMPA position precludes choice of any amino acid 
but alanine or glycine at this position. The highly conserved positions 21 In (Ser or 

10 Asn) and 256 n (Ser or His) also have switched from ligand binding (Asn21 l u and 
Asn256n interact with the 03 and 04 glucose hydroxyls respectively) to structural 
functions. In eleven designs Ser21 l u forms a hydrogen bond with the main-chain 
carbonyl of position Val235 n ; in three designs Asn21 l n interacts both with the amide 
protein of Met214 n and the carbonyl of ffisl83 n . In the majority of the designs Ser256 u 

15 forms a hydrogen bond with Gln261 u outside the PCS (with the exception of PG10, 
where Ser21 l n forms a hydrogen bond to PMPA). 

The designs leave a cavity between Ser256n and the PMPA pinacolyl group. 
The penalty for solvent accessibility of the hydrophobic ligand moiety apparently was 
insufficient to overcome the reward for forming the inter-residue hydrogen bond. We 

20 constructed additional point mutations at position 256 n in designs PG4 and PG12 to fill 
this cavity (PG4_256F and PG12_256F ; Table 3). 

Sequences at the variable positions are diverse: on average 33% of the residues 
differ among the designs, reflecting alternative ways for providing hydrogen bonds and 
hydrophobic surfaces. The designs vary in their PCS positions at which hydrogen- 

25 bonding side chains are placed. 

The three designs constructed in RBP also exhibit variations in sequence 
diversity and residue function switching. In PR8 Ser235 n is associated with a defect 
analogous to Ser256 n in GBP. Ser235n makes no direct contacts with PMPA, but forms 
a hydrogen bond with the hydroxyl of Serl03n, resulting in a cavity near the pinacolyl 

30 group. To fill this cavity, additional point mutations were constructed in the RBP 
design PR8 at position 235 u (Table 3). 
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All three RBP designs and ten of the twelve primary GBP designs expressed 
soluble protein; one GBP design did not express, while another precipitated upon 
purification. Several of the mutants were less stable than the parent proteins (GBP, 
58°C; RBP, 60°C), having thermostabilities that range between 32°C to 58°C as 
5 determined by thermal denaturation, monitoring circular dichroism (88). 

Of the eighteen fluorescent conjugates prepared by labeling with thiol-reactive 
fluorophores at Cys256 (RBP) or Cysll2 (GBP), twelve show changes in fluorescence 
upon addition of PMPA (Fig. 13). Neither wild-type RBP nor GBP conjugates respond 
to PMPA. 

10 Observed PMPA affinities range from 68 nM (PR8) to 10 \xU (PG1 8) (Table 3). 

Some of the cavity-filling mutations constructed at position 235 u in RBP show 
improvements in affinity. Phenylalanine at position 235 u increases the affinity of the 
receptor for PMPA (K d = 45 nM), while Ala235n, or De235n have no effect (Table 3). 
The equivalent mutation at position 256n in GBP (PG4_256F, K d - OA \xM; 

15 PG12_256F, K d = 0. 1 1 jiM) has similar effects on binding. 

The ligand-binding specificity of two designs was tested by measuring affinities 
for isopropyl methyl phosphonic acid (IMP A) (Fig. 9), the hydrolysis product of the 
nerve agent sarin. PG10 and PG12 bind EVDPA approximately 10-fold less tightly that 
PMPA (K d = 7 \\M. and K d = 2 |LiM respectively), indicating significant discrimination 

20 between the aliphatic groups of the two molecules. 

The affinities of the designs for pinacolyl alcohol (PA) and methyl phosphonate 
(MP), representing the aliphatic and hydrophilic moieties of PMPA respectively (Fig. 
9), were determined (Table 3). The K d values of the receptors for PA and MP are 10 2 - 
10 4 and 10 4 -10 5 -fold higher than those for PMPA, respectively. A coupling energy (93), 

25 AG C , can be defined as: AG C = AG biPMPA - {AG btPA + AG bMP ) , where AG b>PM PA, AG bM , 

LG biM p are the binding energies ( RT In K d ) for PMPA, PA, and MP, respectively. 

Favorable inter-fragment interactions result in AG C < 0, unfavorable AG C > 0. Analysis 
of fragment binding is typically used to assess strain or entropic factors within a ligand 
(93). Here AG C values between designs are interpreted as strain within the designed 
30 proteins, reflecting differences in the structural complementarity between a design and 
its bound ligand. Figure 14 reveals a positive correlation between AG C and the affinity 
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forPMPA: as AG C decreases, AG htPMPA becomes more favorable. Decreases in fragment 
strain therefore correlate with increased receptor affinities, and indicate differences in 
the complementary of the designed surfaces. 

The contributions of specific interactions were tested by alanine scanning 
mutagenesis in two designs, PG10 and PG12 (Table 4) (94), which bind PMPA in 
opposite orientations. In the PG10 design (MP moiety points inwards) mutation of 
predicted hydrogen bonds to an anionic oxygen (01, PG10_S211A) or the 
phosphoester oxygen (03, PG10JO83A) results in a 2.1 and 2.4 kcal/mol loss of 
binding energy respectively, consistent with typical hydrogen bonding contributions 
(18). Loss of the predicted interaction between Ser256 n and the other anionic oxygen 
has no appreciable effect (02, PG10_S256A), potentially indicating that this hydrogen 
bond is absent. Ser256n is also predicted to form a hydrogen bond with Gln261n. The 
two interactions therefore may compete rather than co-exist. In the PG12 design (MP 
moiety points outwards), loss of mutation of predicted hydrogen bonds to the anionic 
oxygens (01, PG12_N152A; 02, PG12_S154A; 01 PG12_H183A) results in a 2-3 
kcal/mol loss of binding energy, consistent with the model (Table 4). 

Van der Waals interactions were also investigated. In PG10 Tyrl54n interacts 
with the pinacolyl moiety of PMPA and hydrogen bonds to Thrl 10 u . Loss of these 
predicted interactions decreases binding by 2.4 kcal/mol (Table 4). Furthermore, 
binding of PA, but not MP is affected consistent with the orientation of PMPA in the 
model. Similarly, in PG12, Asn211 n forms van der Waals interactions with the 
pinacolyl moiety and hydrogen bonds to the backbone carbonyl of position 214 n . Loss 
of these predicted interactions (PG12_N21 1 A) results in a decreased affinity for 
PMPA, but to a lesser extent (0.9 kcal/mol) than is observed for the Tyrl54 n in PG10. 
Again, as expected, PA, but not MP binding is affected. 

Alanine-scanning mutagenesis has also demonstrated that the inter-domain 
latch, contributed by constant residues Ser92 x and Asnl52 u , is important for binding 
(Table 4). Mutations of either residue decrease binding, as expected for the removal of 
an interaction that stabilizes the closed state (95, 96). 

The Ser256 n Ala mutation in PG12 exhibits the largest change in affinity (4 
kcal/mol) (Table 4). This residue is not predicted to interact directly with PMPA, 
instead it hydrogen bonds to Gln261 u leaving a cavity. Enlargement of this putative 
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cavity in the alanine mutation is predicted to trap water near the hydrophobic pinacolyl 
moiety, thereby decreasing the affinity for PMPA. Loss of PA and retention of MP 
binding in this mutant is observed and is consistent with this interpretation. 

The designs introduced 9 to 12 mutations in the parent proteins. Twelve of 
5 twenty designs tested exhibited PMPA-dependent changes in emission intensity of a 
fluorescent reporter with affinities between 45 nM and 10 jjM. The contributions to 
ligand binding by individual residues were determined in two designs by alanine- 
scanning mutagenesis, and are consistent with the molecular models. These results 
demonstrate that designed receptors with radically altered binding specificities and 

10 affinities that rival or exceed those of the parent proteins can be successfully predicted. 
The designs vary in parent scaffold, sequence diversity, and orientation of docked 
ligand, suggesting that the number of possible solutions to the design problem is large 
and degenerate. This observation has implications for the genesis of biological function 
by random mutagenic processes. 

15 About 50% of the computer-generated designs show PMPA-mediated changes 

in fluorescence of the covalently coupled reporter groups (57% if designs that do not 
express or that precipitate are discounted). This success rate represents a lower bound, 
because false negatives can arise if the equilibrium between the open and closed states 
is sufficiently altered to preclude their interconversion, or if the fluorophore no longer 

20 interacts differentially with these two conformations. 

PMPA affinities of the designed receptors range from 45 nM to 10 ^M. RBP 
and GBP bind their cognate sugars with 0.2 jjM and 0.5 \xM affinities respectively (7). 
Empirical limits have been established for the ligand affinities of naturally evolved 
proteins (97). For PMPA this limit ranges from about 2 nM to about 1 ^iM. The 

25 affinities of many designs reported here fall within this range and rival or exceed those 
of the parent receptors. 

Selected designs sample both high- and lower-ranked candidates. Designs 
selected from the top 20 exhibit higher affinities for PMPA than those selected from 
lower-ranked designs (Fig. 12). Analysis of the affinities for PA and MP suggests that 

30 the designed receptors differ in the strain they impose upon the ligand (Fig. 14) (93). 

The effects of individual alanine mutations on PMPA binding in designs PG10 
and PG12 are mostly consistent with the predicted interactions. Furthermore, the 
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designed receptors distinguish steric differences between the aliphatic moieties of 
PMPA and IMPA (Fig. 9). We therefore conclude that predicted molecular models of 
the designs are largely correct. 

The designs contain defects, indicating that the computational design methods 
5 require further improvements. Virtually all designs have a cavity between the protein 
and bound ligand in the vicinity of the hinge region. This cavity defect is likely to be a 
consequence of inaccurate modeling of relative contributions by hydrogen bonds, polar 
group burial, solvent accessibility, and omission of electrostatic contributions. 
Nevertheless, the experimentally validated ligand-binding properties of the designs 

10 reported here demonstrate that even relatively simple representations of atomic 
interactions are sufficiently powerful to capture dominant effects of biomolecular 
recognition in design calculations. 

The designed PCS has fewer residues that make direct contacts with the ligand 
than those in the wild-type receptors. Consequently, a significant fraction of the side 

15 chains switch function from ligand binding in the wild-type receptor to a structural role 
in the designed receptors and lack sequence diversity. The residues that interact directly 
with the ligand, however, are highly diverse and depend on the orientation of the bound 
ligand. Thus even in this small set of designs, significant diversity in structure and 
sequence is observed, suggesting that solutions to the design problem are highly 

20 degenerate. These observations presumably reflect a fundamental characteristic of 

protein sequences, since potential diversity is an essential prerequisite for the genesis of 
function by the random processes of organic evolution (98). 

The receptors described here can function as reagentless fluorescent biosensors 
for PMPA with a lower detection limit of about 4 nM (about 1 ppb). Given the 

25 structural similarities between soman and PMPA, the designed receptors are likely to 
bind soman with affinities similar to those of PMPA. The detection limit is probably 
sufficient for the development of stand-off or post-incident detectors of soman, and 
rivals the lower limits of current methods. Unlike acetylcholinesterase-based assays, the 
designed receptors described here do not rely on the presence of soman, which rapidly 

30 degrades to form PMPA. Other techniques require several components and longer 
preparation, incubation, and detection times. A reagentless fluorescence biosensor has 
significant advantages such as rapidity of the fluorescent response, reversibility, and 
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simplicity. The molecular recognition element in a deployable biosensor must be 
sufficiently robust to withstand field conditions. The designed receptors reported here 
do not yet meet this standard, since their thermostability may not be sufficiently high. ; 
Nevertheless, computationally designed receptors represent an initial stage in the 
5 development of a novel class of biosensors for the rapid, continuous, and accurate 
detection of nerve agents. 

EXAMPLE 3 

Enzymes are amongst the most proficient catalysts known (99), and catalyze a 

10 wide variety of reactions in aqueous solutions under ambient conditions with exquisite 
selectivity and stereospecificity. Catalysis takes place in tailored pockets that 
simultaneously optimize binding of reactants, intermediates, transition states, and 
products, orient reactive residues, stabilize transition states, select catalytically 
competent substrate conformations, and dynamically interconvert between microstates 

15 (100, 101). The rational design of enzymes has tremendous practical potential for 

developing novel synthetic routes (73, 102), but presents a formidable challenge and is 
one of the most stringent tests for understanding protein chemistry. Here we present 
structure-based computational design techniques that predict mutations for the 
construction of catalytically active sites in proteins of known structure. Using these 

20 methods, we converted ribose-binding protein (62) into analogs (NovoTims) of the 
glycolytic enzyme triose phosphate isomerase (103). Several NovoTims exhibit rate 
enhancements of approximately 10 5 to 10 6 and are biologically active, supporting 
growth of Escherichia coli under gluconeogenic conditions. The inherent generality of 
computational design implies that it may be possible to design many enzymes by this 

25 approach. 

Triose phosphate isomerase (TIM) is an essential component of the Embden- 
Meyerhof pathway (104), interconverting dihydroxyacetone phosphate (DHAP) and 
glyceraldehyde-3-phospate (GAP) (Fig. 15 A). In glycolysis TM channels these two 
triose phosphate products of aldolase into pyruvate; in gluconeogenesis TIM ensures 
30 that both substrates are supplied to aldolase. The isomerization reaction involves two 
successive proton exchanges (103) (Fig. 15B), and is considered an archetype for 
proton transfer chemistry, which is central to many enzyme mechanisms (105). 
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Extensive studies support a mechanism (103) whereby a carboxylate abstracts the 
DHAP pro-R proton at CI to form a cis-enediol(ate) intermediate, followed by 
imidazole-mediated proton transfer between the CI ari&Q2 oxygens, yielding GAP. 
The CI proton pK a of about 18 imposes a large barrier to proton abstraction (106), 

5 which is overcome by a low-barrier hydrogen bond (107) (LBHB) that requires precise 
functional group alignment (108-110). Transition states are further stabilized 
electrostatically by lysine (109, 110). TIM also selects a substrate conformation that 
minimizes alignment of the enediolate double bond and phosphate n systems, thereby 
stereoelectronically disfavoring an undesirable (5-elimination of the phosphate (111) 

10 that produces methylglyoxal (MG) which is cytotoxic in excess (1 12). A mobile loop 
permits substrate access and sequesters the reaction from solvent (1 13) (Fig. 15C). The 
TIM reaction therefore presents a complex design target demanding simultaneous 
capture of many mechanistic principles: acid-base catalysis, transition state 
stabilization, reactive group alignment, low-barrier hydrogen bonds, stereoelectronic 

15 control by ground state selection, electrostatic effects, and protein dynamics. 

Here we demonstrate that structure-based computational design techniques can 
be used to introduce isomerase activity into the bacterial ribose-binding protein (RBP) 
which is a periplasmic receptor that has no known catalytic activity. RBP is a monomer 
and consists of two domains linked by a hinge region (62) (Fig. 15C). The protein 

20 adopts two conformations, a ligand-free open form, and a ligand-bound closed form, 
which interconvert via hinge-bending motions. Analogous to TIM, the ribose ligand is 
sequestered from solvent in the closed form. TIM is a homodimer of a/(3 barrel 
monomers (109, 110) (Fig. 15C). RBP and TIM structures fall into different topological 
classes. Introduction of TIM activity into RBP is therefore equivalent to convergent 

25 evolution by computational design. 

Initially we tested whether RBP can be redesigned to bind GAP and DHAP, 
without regard to catalytic activity. The design algorithm predicted mutations that 
convert RBP (PDB code: 2DRI for wild-type sequence) into a receptor for DHAP by 
changing the layer of residues directly contacting ribose in the wild-type protein 

30 structure. Sequences that form stereochemically complementary ligand-binding 

surfaces were identified using a combinatorial optimization algorithm that integrates 
ligand docking and placement of amino acid side-chain rotamer libraries to locate 
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energetic minima in a potential function incorporating van der Waals, hydrogen 
bonding, solvation, and electrostatic interactions (87) between the amino acids and 
ligand. Four designs bind DH£P and GAP with micromolar affinities (Fig. 16A) but 
exhibit no TIM activity. This experiment shows that RBP can be mutated to bind both 
substrates, which is a necessary preliminary finding prior to the introduction of 
catalysis. 

To include catalytic activity in the receptor, we developed a new procedure that 
introduces catalytically active residues into the receptor design process (Fig. 17A). 
First, a geometrical definition of key interactions contributing to catalysis is generated. 
Second, a combinatorial search algorithm (85) identifies positions where placement of 
catalytic residues and substrate simultaneously satisfies these geometrical constraints. 
Third, the remainder of the complementary surface is generated around the placed 
substrate using the receptor design algorithm. Designs were generated using the 
allowed geometrical relationships between the enediolate reaction intermediate, 
glutamate, histidine, and lysine as a minimalist model of interactions that are critical to 
catalysis (Fig. 17B). We tested fourteen designs subdivided into three families that 
differ in placement of these three catalytic residues (Table 5). Seven designs show 
increases in GAP production over background. One design, NovoTiml.0, is 
significantly more active. It exhibits saturation kinetics (Table 6), and is competitively 
inhibited by phosphoglycolate (K, = 130 \xM), a known inhibitor of wild-type TIM 
(103) (K- t = 4 jjM). 

NovoTiml.0 is less thermostable than the parent protein (Fig. 18 A). We 
postulated that steric imperfections in the interactions between the designed binding 
surface residues and the surrounding protein matrix cause this decreased stability. 
Previously, we have established that in RBP-based metalloprotein designs, stability is 
restored by designing mutations in residue layers surrounding designed binding 
surfaces (114). We redesigned NovoTiml.0 in a similar manner (Fig. 18C; Table 5). In 
NovoTiml.l, the thirteen original mutations were retained and nine additional ones 
identified by computational design, which increased the stability by 5°C. In 
NovoTiml.2, only the three catalytic residues were retained, and the sequences of the 
nine binding and nine interfacial residues were (re-)designed together. NovoTiml.2 
stability is increased by 15°C, approaching that of the parent protein. NovoTiml.l has 
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similar kinetic properties as NovoTiml.O, whereas in NovoTiml.2 & cat and Km each has 
improved approximately two-fold (Fig. 18B; Table 6). 

. At least 95% of DHAP (GAP) is converted into GAP (DHAP) in the reaction 
catalyzed by NovoTims, as judged by NADH (NAD + ) production. The loss of enzyme 
5 activity observed in single, double, and triple alanine mutants of NovoTiml.2 indicates 
• that all three designed catalytic residues make critical contributions to catalysis (Fig. 
18C). The pH dependencies of the forward and reverse reactions catalyzed by 
NovoTiml.2 are similar to wild-type TIM (Fig. 18D). These results show that the 
desired reaction is predominant, the designed catalytic groups are key to the enzyme 
10 mechanism, and the active site microenvironment approximates the naturally, evolved 
enzyme. 

In E. coli, gluconeogenic growth on lactate or glycerol requires TIM activity 
(Fig. 15A). Glycerol feeds into DHAP and places more stringent demands than lactate 
on TIM activity, because elevated DHAP levels increase cytotoxic MG production, 

15 which is mitigated through TM-mediated conversion of DHAP into GAP (112). 
Complementation of a TIM-deficient strain (104), DF502, by over-expressed 
NovoTims was tested on both gluconeogenic substrates (115) in the presence and 
absence of the inducer isopropyl-P-D-thiogalactopyranoside (IPTG). NovoTims 1.0 and 
1.2 (1.1 not tested) support IPTG-dependent growth on lactate, but not glycerol. 

20 NovoTiml.2 was further mutagenized by an error-prone polymerase chain reaction 
(116), and mutants were selected on glycerol. Four isolates were obtained from 
approximately 10 5 transformants. The different mutations in NovoTimsl.2.1-1.2.4 are 
localized on the protein surface (Fig. 16C) and improve £ ca t and K M values, with the 
largest changes corresponding to two-fold and three-fold increases in k^t and &cat/^M 

25 values respectively. 

We have successfully converted a protein devoid of catalytic activity into a 
triose phosphate isomerase, using computational design techniques to predict 13 to 21 
mutations that introduce three catalytically active residues together with a stereochemi- 
cally complementary substrate-binding surface. This minimalist design is based on key 

30 short-range interactions observed in naturally evolved TIMs, and is sufficient to 

increase the NovoTim-catalyzed reaction 10 5 -fold to 10 6 -fold over background. This 
rate enhancement is the largest reported for rationally designed enzymes (73, 102). 
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NovoTiml.2 is sufficiently active to support growth under permissive gluconeogenic 
conditions, and requires only small improvements to support full biological activity. 
Nevertheless, the fc cat and k c JK M values of NovoTiml.2.1 are 2,700-fold and 220-fold ! 
less than wild-type TIM, whose apparent second-order rate constant approaches the 
5 diffusion-limited encounter of enzyme with substrate (103). Alanine-scanning 

mutagenesis indicates that all residues designed to be catalytically active contribute 
significantly to rate enhancement. Furthermore, the electrostatic microenvironment as 
probed by pH dependence of fc cat is similar to the wild-type enzyme. However, it is 
likely that NovoTims have a sub-optimal hydrogen bond between the catalytic 

10 glutamate and substrate CI proton, which is a critical feature of the TIM reaction 
mechanism (108-110) (we note that shortening of glutamate to aspartate in the wild- 
type enzyme (117), presumably destroying the LBHB, results in a mutant with similar 
activity as NovoTims). Elaboration of the minimalist mechanism in future designs will 
allow testing of other contributions to rate enhancement, such as protein dynamics and 

15 long-range electrostatics. 

Rational design of enzymes is a stringent test of our understanding of protein 
chemistry and has numerous potential applications. Here we present and experimentally 
validate the computational design of enzyme activity in proteins of known structure. 
We have predicted mutations that introduce triose phosphate isomerase activity into 

20 ribose-binding protein, a receptor that is normally devoid of enzyme activity. The 
resulting designs contain 18 to 22 mutations, exhibit 10 5 -fold to 10 6 -fold rate 
enhancements over the uncatalyzed reaction, and are biologically active, supporting 
growth of Escherichia coli under gluconeogenic conditions. 

The combined placement of mechanistically critical residues with construction 

25 of a surface that is stereochemical^ complementary to the entire substrate (and 

product) is a critical aspect of the design method presented here. This capability was 
absent in previously reported attempts at enzyme design (73) and is likely to be the 
main reason for the much higher rate enhancements and apparent second order rate 
constants observed in this study. With the prediction accuracies now within reach of 

30 computational protein design (11,118), and introduction of increasing levels of 
mechanistic detail and sophistication in future designs, this design process can be 
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extended to other substrates and reactions using our knowledge of catalytically active 
residues and well-known principles of enzyme chemistry (122). 
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In stating a numerical range, it should be understood that all values within the 
range are also described (e.g., one to ten also includes every integer value between one 
and ten as well as all intermediate ranges such as two to ten, one to five, and three to 
eight). The term "about" may refer to the statistical uncertainty associated with a 
5 measurement or the variability in a numerical quantity which a person skilled in the art 
would understand does not affect operation of the invention or its patentability. 

All modifications and substitutions that come within the meaning of the claims 
and the range of their legal equivalents are to be embraced within their scope. A claim 
using the transition "consisting" allows the inclusion of other elements to be within the 

10 scope of the claim; the invention is also described by such claims using the transitional 
phrase "consisting essentially of (i.e., allowing the inclusion of other elements to be 
within the scope of the claim if they do not materially affect operation .of the invention) 
and the transition "consisting" (i.e., allowing only the elements listed in the claim other 
than impurities or inconsequential activities which are ordinarily associated with the 

15 invention) instead of the "comprising" term. Any of these three transitions can be used 
to claim the invention. 

It should be understood that an element described in this specification should 
not be construed as a limitation of the claimed invention unless it is explicitly recited in 
the claims. Thus, the granted claims are the basis for determining the scope of legal 

20 protection instead of a limitation from the specification which is read into the claims. In 
contradistinction, the prior art is explicitly excluded from the invention to the extent of 
specific embodiments that would anticipate the claimed invention or destroy novelty. 

Moreover, no particular relationship between or among limitations of a claim is 
intended unless such relationship is explicitly recited in the claim (e.g., the arrangement 

25 of components in a product claim or order of steps in a method claim is not a limitation 
of the claim unless explicitly stated to be so). All possible combinations and 
permutations of individual elements disclosed herein are considered to be aspects of the 
invention. Similarly, generalizations of the invention's description are considered to be 
part of the invention. From the foregoing, it would be apparent to a skilled person that 

30 the invention can be embodied in other specific forms without departing from its spirit 
or essential characteristics. The described embodiments should be considered only as 
illustrative, not restrictive. 
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Table 2. Affinities of the Designed Receptors for Target Ligands and Analogs 
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Receptor 
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Al 
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65 
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* The limit of detection for the nitro compounds corresponds to affinities of approximately 10 rnM, and 
for lactate analogues to 100 mM. Error of Kd measurement is approximately 10%. 
Abbreviations: TNB, trinitrobenzene; DNT, dinitrobenzene; Pyr, pyruvate; Stn, serotonin; Trp, L- 
tryptophan; Trm, tryptamine 
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CLAIMS 

1. A process for protein design in accordance with spatial and energy relationships 
between a proteinaceous receptor and a ligand, the process comprising: 

(a) generating a collection of ligand poses to provide a Docking Zone which 
represents potential conformation and degrees of freedom of the ligand relative 
to the receptor, 

(b) generating a collection of side-chain conformations on the receptor's backbone 
to provide an Evolving Zone which represents potential receptor mutants, 

(c) constructing a cost function from atomic interaction(s) between the ligand poses 
of the Docking Zone and the side chains of the Evolving Zone and between side 
chains of the Evolving Zone, and 

(d) selecting one or more combinations of single ligand pose and cognate receptor 
mutant which correspond to optimal or near-optimal values of the cost function 
to generate a collection of potential receptor mutants with ligand-binding sites, 
wherein the protein designed by the process is a potential receptor mutant. 

2. The process according to Claim 1 further comprising (e) rank-ordering ligand- 
binding sites of potential receptor mutants by a fitness metric prior to confirming 
whether or not one or more receptor mutants bind to the ligand or an analog thereof. 

3. The process according to Claim 2, wherein the fitness metric comprises one or 
more descriptors selected from the group consisting of a semi-empirical or universal 
force field, solvent-accessible area, cavity volume, ligand affinity, and ligand reactivity. 

4. The process according to any one of Claims 1-3, wherein only a subset of all 
possible combinations between ligand poses of the Docking Zone and side chains of the 
Evolving Zone in at least (d) are further evaluated. 

5. The process according to Claim 4 further comprising evaluation of the hydrogen 
bond inventory for at least one ligand pose of the Docking Zone. 
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6. The process according to Claim 4 further comprising evaluation of a binding 
surface inventory for atomic interaction(s) between at least one ligand pose of the 
Docking Zone and at least one side chain of the Evolving Zone. 

7. The process according to Claim 1, wherein all possible combinations between 
ligand poses of the Docking Zone and side chains of the Evolving Zone in at least (d) 
are further evaluated. 

8. The process according to Claim 1 further comprising introducing additional 
mutations in the designed protein and selecting a re-designed protein for at least one of 
increased stability, increased affinity, and increased catalytic activity or enzyme 
turnover. 

9. A process for manufacturing a protein, wherein the process comprises 
expressing and isolating the one or more receptors predicted by any one of Claims 1-8 
to bind the ligand. 

10. A computer system, wherein the process of any one of Claims 1-8 is 
implemented as instructions for manipulating data by the computer system. 

11. A tangible medium, wherein the process of any one of Claims 1-8 is stored 
thereon as software. 

12. A protein designed by the process according to any one of Claims 1-8. 

13. A protein produced by the process according to Claim 9. 

14. The protein of Claim 13, wherein the protein is comprised of an amino acid 
sequence selected from the group consisting of mutant receptors listed in Tables 1, 3 
and 5. 
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15. The protein of Claim 12 or 13, wherein the ligand confers allosteric regulation 
on. protein activity. 

16. A catalyst comprised of the protein of Claim 12 or 13. 

17. An affinity or chiral purification reagent comprised of the protein of Claim 12 
or 13. 

18. A biosensor comprised of the protein of Claim 12 or 13. 

19. A nucleic acid which encodes the protein of Claim 12 or 13. 

20. An expression vector comprised of the nucleic acid of Claim 19. 

21. An engineered cell, tissue, or non-human organism which expresses the protein 
of Claim 12 or 13, or which is comprised of the nucleic acid of Claim 19 or the 
expression vector of Claim 20. 

22. The engineered cell, tissue, or non-human organism of Claim 21, wherein the 
protein is in at least one of a signal transduction pathway, a genetic circuit, or a 
metabolic pathway. 
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Leu Leu Lys Gly Glu Pro Gly His Pro Asp Ala Glu Ala Arg Thr Thr 
145 150 155 . ~ 160 

Tyr Val He Lys Glu Leu Asn Asp Lys Gly He Lys Thr Glu Gin Leu 
165 170 175 

Gin Leu Asp Thr Ala Met Trp Asp Thr Ala Gin Ala Lys Asp Lys Met 
180 185 190 

Asp Ala Trp Leu Ser Gly Pro Asn Ala Asn Lys He Glu Val Val He 
195 200. 205 

Ala Asn Asn Asp Ala Met Ala Met Gly Ala Val Glu Ala Leu Lys Ala 
210 215 220 

His Asn Lys Ser Ser He Pro Val Phe Gly Val Asp Ala Leu Pro Glu 
225 230 235 240 

Ala Leu Ala Leu Val Lys Ser Gly Ala Leu Ala Gly Thr Val Leu Asn 
245 250 255 

Asp Ala Asn Asn Gin Ala Lys Ala Thr Phe Asp Leu Ala Lys Asn Leu 
260 265 270 

Ala Asp Gly Lys Gly Ala Ala Asp Gly Thr Asn Trp Lys He Asp Asn 
275 280 " 285 

Lys Val Val Arg Val Pro Tyr Val Gly Val Asp Lys Asp Asn Leu Ala 
290 295 300 

Glu Phe Ser Lys Lys 
305 



<210> 2 

<211> 271 

<212> PRT 

<213> Escherichia coli 

<400> 2 ' ~ ' 

Lys Asp Thr He Ala Leu Val Val Ser Thr Leu Asn Asn Pro Phe Phe 
1 5 10 15 

Val Ser Leu Lys Asp Gly Ala Gin Lys Glu Ala Asp Lys Leu Gly Tyr 
20 25 30 

Asn Leu Val Val Leu Asp Ser Gin Asn Asn Pro Ala Lys Glu Leu Ala 
35 40 45 ■ 

Asn Val Gin Asp Leu Thr Val Arg Gly Thr Lys He Leu Leu He Asn 
50 55 60 

Pro Thr Asp Ser Asp Ala Val Gly Asn Ala Val Lys Met Ala Asn Gin 
65 70 75 80 
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Ala Asn He Pro Val He Thr Leu Asp Arg Gin Ala Thr Lys Gly Glu 
85 90 95 

Val Val Ser His He Ala Ser Asp Asn Val Leu Gly Gly Lys He Ala 
100 105 HO 

Gly Asp Tyr He Ala Lys Lys Ala Gly Glu Gly Ala Lys Val He Glu 
115 120 125 

Leu Gin Gly He Ala Gly Thr Ser Ala Ala Arg Glu Arg Gly Glu Gly 
130 135 140 

Phe Gin Gin Ala Val Ala Ala His Lys Phe Asn Val Leu Ala Ser Gin 
145 150 155 160 

Pro Ala Asp Phe Asp Arg He Lys Gly Leu Asn Val Met Gin Asn Leu 
165 170 175 

Leu Thr Ala His Pro Asp Val Gin Ala Val Phe Ala Gin Asn Asp Glu 
180 185 190 

Met Ala Leu Gly Ala Leu Arg Ala Leu Gin Thr Ala Gly Lys Ser Asp 
195 200 205 

Val Met Val Val Gly Phe Asp Gly Thr Pro Asp Gly Glu Lys Ala Val 
210 215 220 

Asn Asp Gly Lys Leu Ala Ala Thr He Ala Gin Leu Pro Asp Gin He 
225 " 230 235 240 

Gly Ala Lys Gly Val Glu Thr Ala Asp Lys Val Leu Lys Gly Glu Lys 
245 250 255 

Val Gin Ala Lys Tyr Pro Val Asp Leu Lys Leu Val Val Lys Gin 
260 265 270 



<210>. 3 

<211> 306 

<212> PRT 

<213> Escherichia coli 

<400> 3 

Glu Asn Leu Lys Leu Gly Phe Leu Val Lys Gin Pro Glu Glu Pro Trp 
1 5 10 15 

Phe Gin. Thr Glu Trp Lys Phe Ala Asp Lys Ala Gly Lys Asp Leu Gly 
20 " 25 30 

Phe Glu Val He Lys He Ala Val Pro Asp Gly Glu Lys Thr Leu Asn 
35 40 45 
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Ala He Asp Ser Leu Ala Ala Ser Gly Ala Lys Gly Phe Val He Cys 
50. 55 60 

Thr Pro Asp Pro Lys Leu Gly Ser Ala He Val Ala Lys Ala Arg Gly 
65 70 75 80 

Tyr Asp Met Lys Val He Ala Val Asp Asp Gin Phe Val Asn Ala Lys 
85 90 95 

Gly Lys Pro Met Asp Thr Val Pro Leu Val Met. Met Ala Ala Thr Lys 
100 105 110 

He Gly Glu Arg Gin Gly Gin Glu Leu Tyr Lys Glu Met Gin Lys Arg 
' 115 120 125 

Gly Trp Asp Val Lys Glu Ser Ala Val Met Ala He Thr Ala Asn Glu 
130 135 140 

Leu Asp Thr Ala Arg Arg Arg Thr Thr Gly Ser Met Asp Ala Leu Lys 
145 • 150 155 160 

Ala Ala Gly Phe Pro Glu Lys Gin He Tyr Gin Val Pro Thr Lys Ser 
165 170 175 

Asn Asp He Pro Gly Ala Phe Asp Ala Ala Asn Ser Met Leu Val Gin 
180 185 190 

His Pro Glu Val Lys His Trp Leu He Val Gly Met Asn Asp Ser Thr 
195 200 205 

Val Leu Gly Gly Val Arg Ala Thr Glu Gly Gin Gly Phe Lys Ala Ala 
210 215 220 

Asp He He Gly He Gly He Asn Gly Val Asp Ala Val Ser Glu Leu 
225 230 235 240 

Ser Lys Ala Gin Ala Thr Gly Phe Tyr Gly Ser Leu Leu Pro Ser Pro 
245 250 255 

Asp Val His* Gly Tyr Lys Ser Ser Glu Met Leu Tyr Asn Trp Val Ala 
260 265 270 

Lys Asp Val Glu Pro Pro Lys Phe Thr Glu Val Thr Asp Val Val Leu 
275 280 285 

lie' Thr Arg Asp Asn Phe Lys Glu Glu Leu Glu Lys Lys Gly Leu Gly 
290 295 300 

Gly Lys 
305 
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<213> Escherichia coli 
<400> 4 

Ala Asp Lys Lys Leu Val Val Ala Thr Asp Thr Ala Phe Val Pro Phe 
1 5 10 15 

Glu Phe Lys Gin Gly Asp Lys Tyr Val Gly Phe Asp Val Asp Leu Trp 
20 25 30 

Ala Ala lie Ala Lys Glu Leu Lys Leu Asp Tyr Glu Leu Lys Pro Met 
35 40 45 

Asp Phe Ser Gly lie lie Pro Ala Leu Gin Thr Lys Asn Val Asp Leu 
50 55 60 

Ala Leu Ala Gly lie Thr lie Thr Asp Glu Arg Lys Lys Ala lie Asp 
65 70 75 80 

Phe Ser Asp Gly Tyr Tyr Lys Ser Gly Leu Leu Val Met Val Lys Ala 
85 90 95 

Asn Asn Asn Asp Val Lys Ser Val Lys Asp Leu Asp Gly Lys Val Val 
100 105 110 

Ala Val Lys Ser Gly Thr Gly Ser Val Asp Tyr Ala Lys Ala Asn lie 
115 120 125 

Lys Thr Lys Asp Leu Arg Gin Phe Pro Asn lie Asp Asn Ala Tyr Met 
130 135 140 

Glu Leu Gly Thr Asn Arg Ala Asp Ala Val Leu His Asp Thr Pro Asn 
145 150 155 160 

lie Leu Tyr Phe lie Lys Thr Ala Gly Asn Gly Gin Phe Lys Ala Val 
165 170 175 

Gly Asp Ser Leu Glu Ala Gin Gin Tyr Gly lie Ala Phe Pro Lys Gly 
180 185 190 

Ser Asp Glu Leu Arg Asp Lys Val Asn Gly Ala Leu Lys Thr Leu Arg 
195 200 205 

Glu Asn Gly Thr Tyr Asn Glu lie Tyr Lys Lys Trp Phe Gly Thr Glu 
210 215 220 

Pro Lys 

225 . 



<210> 5 

<211> 238 

<212> PRT 

<213> Escherichia coli 
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<400> 5 

Ala lie Pro Gin Asn lie Arg lie Gly Thr Asp Pro Thr Tyr Ala Pro 
15 10 15 

Phe Glu Ser Lys Asn Ser Gin Gly Glu Leu Val Gly Phe Asp He Asp 
20 25 30 

Leu Ala Lys Glu Leu Cys Lys Arg He Asn Thr Gin Cys Thr Phe Val 
35 40 45 

Glu Asn Pro Leu Asp Ala Leu He Pro Ser Leu Lys Ala Lys Lys lie 
50 55 60 

Asp Ala He Met Ser Ser Leu Ser He Thr Glu Lys Arg Gin Gin Glu 
65 70 m 75 80 

He Ala Phe Thr Asp Lys Leu Tyr Ala Ala Asp Ser Arg Leu Val Val 
85 90 95 

Ala Lys Asn Ser Asp He Gin Pro Thr Val Glu Ser Leu Lys Gly Lys 
100 105 110 

Arg Val Gly Val Leu Gin Gly Thr Thr Gin Glu Thr Phe Gly Asn Glu 
115 120 125 

His Trp Ala Pro Lys Gly lie Glu He Val Ser Tyr Gin Gly Gin Asp 
130 135 140 

Asn He Tyr Ser Asp Leu Thr Ala Gly Arg He Asp Ala Ala Phe Gin 
145 150 155 160 

Asp Glu Val Ala Ala Ser Glu Gly Phe Leu Lys Gin Pro Val Gly Lys 
165 170 175 

Asp Tyr Lys Phe Gly Gly Pro Ser Val Lys Asp Glu Lys Leu Phe Gly 
180 185 190 

Val Gly Thr Gly Met Gly Leu Arg Lys Glu Asp Asn Glu Leu Arg Glu 
195 200 205 

Ala Leu Asn Lys Ala Phe Ala Glu Met Arg Ala Asp Gly Thr Tyr Glu 
210 215 220 

Lys Leu Ala Lys Lys Tyr Phe Asp Phe Asp Val Tyr Gly Gly 
225 230 235 
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